Hyper Parameter Tuning for Deep Neural Networks

By Jaini Bhansali, under the guidance of Professor Nik Bear Brown

                                                                                                         22nd April 2018

Introduction

The impact of Neural Networks and deep Neural Networks has been immensely strong in the recent past. With the increased utilization of Deep Neural Networks there is parallely an increase to improve the performance of the deep neural network. This blog aims to help the user with the following:

  1. Perform Hyper parameter tuning in steps using the Tensorflow Library by Google
  2. Select hyper parameter values based on various performance measures such as Accuracy and Loss
  3. Guide the user on various prospective hyper parameters to tune for various types of deep neural network models

Brief overview of Hyper Parameters

Hyper Parameters control the performance of a deep neural network irrespective of the dataset . These parameters are set before the learning of the deep neural network begins. Given the hyper parameters , the algorithm learns the parameters from the data. The various Hyper parameters used include learning rate, number of epochs, Network initialization, Number of visible layers, gradient estimation to name a few.

Hyper Parameter Tuning Methods

There are many approaches available for hyperparameter tuning which are as follows:

  1. Manual Search (Single parameter) : Tuning Hyperparameters one at time and studing its impact on the accuracy of the model.
  2. Manual Search (Multiple parameter) : Tune multiple parameters simultaneously and observe the change in accuracy
  3. Use a grid search that would do various kinds of searches such as Random Search, Set parameters search

I have implemented Manual Search Single Parameter and Manual Search Multiple Parameter in this blog.

Hyper Parameter Definitions

Bias

A bias is used to ensure that even if there are no input features to yet provide a positive output Hence, the inputs and Weights are treated as Vectors and used as a dot product mathematically.

Activation Function

There are various kinds of activation functions like Sigmoid, Tanh (Hyperbolic Tangent), ReLU (Rectified Linear Unit) to name a few. Each of the activation functions have various applications in different scenarios. For example, Sigmoid functions are popularly used as they transform the input features as an output that lies between 0 and 1 , making it very popular for to represent probabilities. Activation functions are used to introduce nonlinearity into the network for better computation of the output.

Loss in a Neural netwrok

In Simple words the loss function is used to tell us how wrong our predicted value is from the ground truth. The empirical Loss is the Loss calculated over the entire dataset. The loss sometimes referred using different names such as Objective Function, Cost Function or Empirical Risk.

Cross Entropy Loss

This is mainly used with models that output a probability between 0 and 1 , i.e. mainly classification problems. Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. A perfect model will have a log loss of 0. As the predicted probability approached 1 the log loss decreases.

Mean Squared Error Loss

This is mainly used in regression models that output continuous variables. ( In Simple words as the values are further from the predicted value the Mean Squared Error Loss Increases) Hence, the Loss depicts the neural network and how well our Neural Network is doing. The aim would be to reduce the loss and make it minimum. Additionally, the loss is a function of Network Weights.

Loss Optimization

Since, the loss is a function of Network Weights, we can find the weights that contribute to minimum loss.

Loss Scenario

image.png

Learning Rate

In the context of the above loss landscape, the learning rate measures how large the step is towards the descent. Setting the learning rate can be a challenge as it must not be too low that it gets lost in the local minima, also must not be too large such that the model would diverge and blow up.

Regularization

Regularization constrains our optimization problem to discourage complex models and used to generalize our model on unseen data.

Drop Outs

Process used to randomly drop neurons such that activation becomes 0. This ensures the the network does not randomly rely on a few neuron or provide larger weights to neurons.

Early Stopping

Stop training before we have a chance to overfit. In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method. Such methods update the learner so as to make it better fit the training data with each iteration

Different Types of Deep Neural Networks and Datasets used for Hyper Parameter Tuning

The following neural networks and the repective datasets will be used for hyper parameter tuning

  1. Multi Layer Perceptron (MLP) using the IRIS Dataset
  2. Convolutional Neural Network (CNN) using CIFAR 10 dataset
  3. Recurrent Neural Network (RNN) using the HASYV2 Dataset
  4. Restricted Boltzmann Machine (RBM) using MNIST dataset
  5. Generative Adversarial Network (GAN) using MNIST dataset
  6. Autoencoders using MNIST dataset

Each section of the blog provides a basic understanding of the deep neural network being used and explanation of the Neural Network Structure.

Following are the hyper parameters that have beem selected for tuning for each neural network model.

MLP and CNN

image.png

RNN and RBM

image.png

GAN and Autoencoder

image.png

For more information to learn about various hyper parameters and deep neural networks refer to the below video tutorial

  1. Introduction to Deep Nural Networks - MIT 6.S191 https://www.youtube.com/watch?v=a5BUunInTQU&t=1227s

  2. INtroduction of Deep Learning Schedule http://introtodeeplearning.com/#schedule

  3. An Overview of the definition of each hyper parameter used in the blog is provided in the following link https://github.com/jainibhansali/BDIA

Pre- Requisite Knowledge

  1. Intermediate understanding of Python
  2. Use a Jupyter Notebook
  3. Basic knowledge of Deep Neural Networks
  4. Basic Knowledge of the Tensorflow Library

Multi Layer Perceptron Neural Network

A brief Introduction to the Multi Layer Perceptron

The Perceptron is the fundamental building block of a Neural Network. A perceptron is a single Neuron in a Neural Network. The Multi Layer Percepton as the name suggests consists of multiple layers of the perceptrons. MLP's are a simple class of feed forward neural networks which consists of atleast 3 layers of nodes. It is trained by the process of back propagation and each layer of nodes except the input nodes are applied with a non linear activation function. MLP’s are very good classifier algorithm.

Detail of the Stucture of MLP Implemented

The Multi Layered Perceptron (MLP) model used consists of 10 hidden layers. Gaussian Initialization was used to initialize he weights and bias of the network. It was first initialized with a learning rate of 0.01, activation function Relu and using the Softmax Cross Entropy with Logits as the cost function. It traverses the gradient with the gradient estimation Adam Optimizer.

Dataset description

The Iris Dataset is a multivariate dataset. It consists of 3 species of Iris flowers (Iris Setosa, Iris Virginica, Iris Versicolor) of 50 samples each. The dataset consists of petal and sepal width and length respectively for each specie. Based on the 4 features we would be using the MLP Classifier to classify the Iris Specie type.

This dataset has been selected as the Iris Dataset is a popular Hello World Dataset that is used for classification. It would be a good dataset to perform hyper parameter tuning.

Steps to use the Dataset

  1. Download the Iris Dataset from the below link

    https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

  2. Right click and on the web page select save.

  3. While saving the file save it as 'iris.csv' in the path of the jupyter notebook.

A sample of the path in the Jupyter Notebook is shown below:

image.png

Tensorflow Code for MLP Neural Network

The tensorflow code involves creating placefolders 'X' and placeholders 'Y' that store the inputs and the output values of the Neural Network. Next the weights and bias are set for the visible and hidden layers for initializing the network . Following which we can set the hyper parameters which are the learning rate, number of epochs, number of hidden layers, cost and loss function for the network. In order to train the algorithm we must initilize the Tensorflow session. Withing this session we loop through each epoch to train the neural network and evaluate its performance on the basis of iris labels . Next, we evaluate the test set within the session.

Here, n_input=4 here as there are 4 features sepal length, sepal width, petal width and petal length

n_output = 3 has the algorithm is tasked to assign the out put to one of the three kinds of species.

Let see what the code looks like.

Follow the comments for explanation and understanding. First we preprocess the Iris Data

Lets import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf
import time
C:\Users\jaini\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

The Iris Dataset has 4 features the Sepal Length , Sepal Width, Petal Length and Petal Width. The labels which are the name of varios Iris Species are converted into one hot encoded arrays for easy reading of the algorithm. This is done using the label encode values.

Lets see how the dataset looks

In [46]:
#read the iris data into a dataframe
dataframe = pd.read_csv('Iris.csv')
dataframe
Out[46]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
5 6 5.4 3.9 1.7 0.4 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
7 8 5.0 3.4 1.5 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
9 10 4.9 3.1 1.5 0.1 Iris-setosa
10 11 5.4 3.7 1.5 0.2 Iris-setosa
11 12 4.8 3.4 1.6 0.2 Iris-setosa
12 13 4.8 3.0 1.4 0.1 Iris-setosa
13 14 4.3 3.0 1.1 0.1 Iris-setosa
14 15 5.8 4.0 1.2 0.2 Iris-setosa
15 16 5.7 4.4 1.5 0.4 Iris-setosa
16 17 5.4 3.9 1.3 0.4 Iris-setosa
17 18 5.1 3.5 1.4 0.3 Iris-setosa
18 19 5.7 3.8 1.7 0.3 Iris-setosa
19 20 5.1 3.8 1.5 0.3 Iris-setosa
20 21 5.4 3.4 1.7 0.2 Iris-setosa
21 22 5.1 3.7 1.5 0.4 Iris-setosa
22 23 4.6 3.6 1.0 0.2 Iris-setosa
23 24 5.1 3.3 1.7 0.5 Iris-setosa
24 25 4.8 3.4 1.9 0.2 Iris-setosa
25 26 5.0 3.0 1.6 0.2 Iris-setosa
26 27 5.0 3.4 1.6 0.4 Iris-setosa
27 28 5.2 3.5 1.5 0.2 Iris-setosa
28 29 5.2 3.4 1.4 0.2 Iris-setosa
29 30 4.7 3.2 1.6 0.2 Iris-setosa
... ... ... ... ... ... ...
120 121 6.9 3.2 5.7 2.3 Iris-virginica
121 122 5.6 2.8 4.9 2.0 Iris-virginica
122 123 7.7 2.8 6.7 2.0 Iris-virginica
123 124 6.3 2.7 4.9 1.8 Iris-virginica
124 125 6.7 3.3 5.7 2.1 Iris-virginica
125 126 7.2 3.2 6.0 1.8 Iris-virginica
126 127 6.2 2.8 4.8 1.8 Iris-virginica
127 128 6.1 3.0 4.9 1.8 Iris-virginica
128 129 6.4 2.8 5.6 2.1 Iris-virginica
129 130 7.2 3.0 5.8 1.6 Iris-virginica
130 131 7.4 2.8 6.1 1.9 Iris-virginica
131 132 7.9 3.8 6.4 2.0 Iris-virginica
132 133 6.4 2.8 5.6 2.2 Iris-virginica
133 134 6.3 2.8 5.1 1.5 Iris-virginica
134 135 6.1 2.6 5.6 1.4 Iris-virginica
135 136 7.7 3.0 6.1 2.3 Iris-virginica
136 137 6.3 3.4 5.6 2.4 Iris-virginica
137 138 6.4 3.1 5.5 1.8 Iris-virginica
138 139 6.0 3.0 4.8 1.8 Iris-virginica
139 140 6.9 3.1 5.4 2.1 Iris-virginica
140 141 6.7 3.1 5.6 2.4 Iris-virginica
141 142 6.9 3.1 5.1 2.3 Iris-virginica
142 143 5.8 2.7 5.1 1.9 Iris-virginica
143 144 6.8 3.2 5.9 2.3 Iris-virginica
144 145 6.7 3.3 5.7 2.5 Iris-virginica
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

In [48]:
start_time=time.time()
# create a function to label encode each Iris Specie. According to the function the algorithm will identify 
#Iris Setosa as [1,0,0], Iris Versicolor as [0,1,0] and Iris Virginica as [0,0,1]
def label_encode(label):
	val=[]
	if label == "Iris-setosa":
		val = [1,0,0]
	elif label == "Iris-versicolor":
		val = [0,1,0]
	elif label == "Iris-virginica":
		val = [0,0,1]	
	return val
# next we assign each array to a variable
s=np.array([1,0,0])
ve=np.array([0,1,0])
vi=np.array([0,0,1])

# this array is then assigned to each specie in the dataset
dataframe['Species'] = dataframe['Species'].map({'Iris-setosa': s, 'Iris-versicolor': ve,'Iris-virginica':vi})

# we are rearranging the dataset to break the arrangement in the dataset
dataframe=dataframe.iloc[np.random.permutation(len(dataframe))]

#reset the index
dataframe=dataframe.reset_index(drop=True)

# dividing the data set into train and test
#train data
x_input=dataframe.ix[0:105,['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
temp=dataframe['Species']
y_input=temp[0:106]
#test data
x_test=dataframe.ix[106:149,['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
y_test=temp[106:150]
C:\Users\jaini\Anaconda3\lib\site-packages\ipykernel_launcher.py:29: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix
Lets look at the dataframe after transformation
In [49]:
dataframe
Out[49]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 72 6.1 2.8 4.0 1.3 [0, 1, 0]
1 38 4.9 3.1 1.5 0.1 [1, 0, 0]
2 59 6.6 2.9 4.6 1.3 [0, 1, 0]
3 55 6.5 2.8 4.6 1.5 [0, 1, 0]
4 96 5.7 3.0 4.2 1.2 [0, 1, 0]
5 104 6.3 2.9 5.6 1.8 [0, 0, 1]
6 12 4.8 3.4 1.6 0.2 [1, 0, 0]
7 123 7.7 2.8 6.7 2.0 [0, 0, 1]
8 41 5.0 3.5 1.3 0.3 [1, 0, 0]
9 88 6.3 2.3 4.4 1.3 [0, 1, 0]
10 110 7.2 3.6 6.1 2.5 [0, 0, 1]
11 37 5.5 3.5 1.3 0.2 [1, 0, 0]
12 145 6.7 3.3 5.7 2.5 [0, 0, 1]
13 131 7.4 2.8 6.1 1.9 [0, 0, 1]
14 150 5.9 3.0 5.1 1.8 [0, 0, 1]
15 6 5.4 3.9 1.7 0.4 [1, 0, 0]
16 139 6.0 3.0 4.8 1.8 [0, 0, 1]
17 119 7.7 2.6 6.9 2.3 [0, 0, 1]
18 66 6.7 3.1 4.4 1.4 [0, 1, 0]
19 118 7.7 3.8 6.7 2.2 [0, 0, 1]
20 112 6.4 2.7 5.3 1.9 [0, 0, 1]
21 13 4.8 3.0 1.4 0.1 [1, 0, 0]
22 27 5.0 3.4 1.6 0.4 [1, 0, 0]
23 31 4.8 3.1 1.6 0.2 [1, 0, 0]
24 40 5.1 3.4 1.5 0.2 [1, 0, 0]
25 97 5.7 2.9 4.2 1.3 [0, 1, 0]
26 3 4.7 3.2 1.3 0.2 [1, 0, 0]
27 125 6.7 3.3 5.7 2.1 [0, 0, 1]
28 98 6.2 2.9 4.3 1.3 [0, 1, 0]
29 77 6.8 2.8 4.8 1.4 [0, 1, 0]
... ... ... ... ... ... ...
120 56 5.7 2.8 4.5 1.3 [0, 1, 0]
121 71 5.9 3.2 4.8 1.8 [0, 1, 0]
122 148 6.5 3.0 5.2 2.0 [0, 0, 1]
123 113 6.8 3.0 5.5 2.1 [0, 0, 1]
124 26 5.0 3.0 1.6 0.2 [1, 0, 0]
125 73 6.3 2.5 4.9 1.5 [0, 1, 0]
126 142 6.9 3.1 5.1 2.3 [0, 0, 1]
127 106 7.6 3.0 6.6 2.1 [0, 0, 1]
128 48 4.6 3.2 1.4 0.2 [1, 0, 0]
129 18 5.1 3.5 1.4 0.3 [1, 0, 0]
130 86 6.0 3.4 4.5 1.6 [0, 1, 0]
131 92 6.1 3.0 4.6 1.4 [0, 1, 0]
132 93 5.8 2.6 4.0 1.2 [0, 1, 0]
133 29 5.2 3.4 1.4 0.2 [1, 0, 0]
134 45 5.1 3.8 1.9 0.4 [1, 0, 0]
135 69 6.2 2.2 4.5 1.5 [0, 1, 0]
136 62 5.9 3.0 4.2 1.5 [0, 1, 0]
137 33 5.2 4.1 1.5 0.1 [1, 0, 0]
138 143 5.8 2.7 5.1 1.9 [0, 0, 1]
139 116 6.4 3.2 5.3 2.3 [0, 0, 1]
140 102 5.8 2.7 5.1 1.9 [0, 0, 1]
141 16 5.7 4.4 1.5 0.4 [1, 0, 0]
142 129 6.4 2.8 5.6 2.1 [0, 0, 1]
143 115 5.8 2.8 5.1 2.4 [0, 0, 1]
144 1 5.1 3.5 1.4 0.2 [1, 0, 0]
145 7 4.6 3.4 1.4 0.3 [1, 0, 0]
146 9 4.4 2.9 1.4 0.2 [1, 0, 0]
147 68 5.8 2.7 4.1 1.0 [0, 1, 0]
148 15 5.8 4.0 1.2 0.2 [1, 0, 0]
149 5 5.0 3.6 1.4 0.2 [1, 0, 0]

150 rows × 6 columns

We see that the dataset is now ready for the MLP model

Next we will define functions that will help plot the loss vs epoch and accuracy vs epoch. There are 3 lists that are used in the below functions. The Epoch_List, train_loss and train_accuracy stores number of epochs, training loss and the training accuracy respectively.

In [19]:
# plot train loss vs epoch
def plot_loss():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Train Loss vs Epoch', fontsize=15)
    plt.plot(epoch_list, train_loss, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Train Loss')

# plot train accuracy vs epoch
def plot_accuracy():
    plt.subplot(1, 2, 2)
    plt.title('Train Accuracy vs Epoch', fontsize=15)
    plt.plot(epoch_list, train_accuracy, 'b-')
    plt.xlabel('Epoch')
    plt.ylabel('Train Accuracy')
    plt.show()

Now lets define the training model. The following steps are implemeted below :

  1. Define a function that applies weights and bias to the model
  2. Define the Network Initialization as Gaussian for weights and bias
  3. Define Hyper Parameters Cost, Optimization, Learning rate and number of epochs
  4. Start the Tensorflow Session and iterate through each epoch to train the model.
  5. The evaluate function is used to measure to the the performace of the algorithm

Follow the comments to understand step by step

In [27]:
#The below three list will store values of epochs, training loss and testing loss
epoch_list=[]
train_accuracy=[]
train_loss=[]

# Define a function that defines that takes the model and apploes weights and bias

def model(x, weights, bias):
    #weights and bias for the hidden layers
	layer_1 = tf.add(tf.matmul(x, weights["hidden"]), bias["hidden"])
    # appliy non linearity to the first layer
	layer_1 = tf.nn.relu(layer_1)
    
    # weights and bias applied to to the output layer
	output_layer = tf.matmul(layer_1, weights["output"]) + bias["output"]
	return output_layer

# Defining the Learning Rate and Number of epochs
learning_rate=0.01
training_epochs=1000
#display steps is defined to display results after 200 steps
display_steps=200

# n_input=4 here as there are 4 features sepal length, sepal width, petal width and petal length
n_input=4
n_hidden=10
#n_output=3 as there are 3 types of outputs as Iris Versicolor , Iris Setosa and iris virginica
n_output=3

X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])

#weights and biases
# Network Initialization
weights={
  "hidden":tf.Variable(tf.random_normal([n_input,n_hidden],name="weight_hidden")),
  "output" : tf.Variable(tf.random_normal([n_hidden, n_output]), name="weight_output")
}
bias = {
	"hidden" : tf.Variable(tf.random_normal([n_hidden]), name="bias_hidden"),
	"output" : tf.Variable(tf.random_normal([n_output]), name="bias_output")
}

# Call the function that applies the weights and bias to the model
pred = model(X, weights, bias)

#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))

# next, we reduce the cost of optimization
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

#Initialize Global Variables
init = tf.global_variables_initializer()

#start_time=time.time()

# start the tensorflow session
with tf.Session() as sess:
  sess.run(init)
  for epoch in range(training_epochs):
# each record of the training set is applied with the optimizer and the cost whicch is fed through the feed dict
      _, c = sess.run([optimizer, cost], feed_dict={X: x_input, Y:[t for t in y_input.as_matrix()]})
# in order to track progress , print display steps
      if(epoch + 1) % display_steps == 0:
        print("Epoch: ", (epoch+1), "Cost: ", c)
#   print("Optimization Finished!")

        test_result = sess.run(pred, feed_dict={X: x_input})
    
# calculation within the tensor

        correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
        
# evaluating training accuracy

        accuracy_final=accuracy.eval({X: x_input, Y:[t for t in y_input.as_matrix()]})
#   print "Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]})
        print ("Accuracy:", accuracy_final)
    
# append the epoch, trainin loss and training accuracy into list to visualize the graphs

        epoch_list.append(epoch)
        train_loss.append(c)
        train_accuracy.append(accuracy_final)

# evaluating testing accuracy 

  test_result = sess.run(pred, feed_dict={X: x_test})
  correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
  print ("Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]}))

#Plot the graphs

plot_loss()
plot_accuracy() 
end_time = time.time()
print ("Completed in ", end_time - start_time , " seconds")

        
Epoch:  200 Cost:  0.7585987
Accuracy: 0.745283
Epoch:  400 Cost:  0.45963374
Accuracy: 0.9056604
Epoch:  600 Cost:  0.35440665
Accuracy: 0.9433962
Epoch:  800 Cost:  0.2707192
Accuracy: 0.9622642
Epoch:  1000 Cost:  0.20767705
Accuracy: 0.9716981
Accuracy Test: 0.97727275
Completed in  9387.716165781021  seconds

Initial Observation for the MLP Model:

With the enlisted parameters we observe that the loss decreases as the number of epochs increases which is the ideal scenario. We also observe that the accuracy also increases as the number of epochs increases. The Training accuracy = 97.76 % and Testing Accuracy is 97.72%. It is seen that the testing accuracy is slightly greater than the training accuracy that can be attributed to the random seed of the network

Lets start hyper parameter tuning. We will tune the activation functions, Gradient Estimation and a combination of number of epochs and learning rate

Note

In the case of all tuning techniques it is observed that retraining the model might increase or decrease the accuracy slightly due to the random seed. The random seed has not been set to avoid creating a biased model.

Hyper Parameter Tuning the MLP Model

The initial model was initialized with number of epochs as 1000 and learning rate as 0.01. Now we will observe the performance of the model with learning rate of 0.1 and number of epochs as 100.

Lets read and preprocess the data as we did earlier

In [31]:
dataframe = pd.read_csv('Iris.csv')

start_time=time.time()
def label_encode(label):
	val=[]
	if label == "Iris-setosa":
		val = [1,0,0]
	elif label == "Iris-versicolor":
		val = [0,1,0]
	elif label == "Iris-virginica":
		val = [0,0,1]	
	return val

s=np.array([1,0,0])
ve=np.array([0,1,0])
vi=np.array([0,0,1])
dataframe['Species'] = dataframe['Species'].map({'Iris-setosa': s, 'Iris-versicolor': ve,'Iris-virginica':vi})

dataframe=dataframe.iloc[np.random.permutation(len(dataframe))]

dataframe=dataframe.reset_index(drop=True)

#train data
x_input=dataframe.ix[0:105,['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
temp=dataframe['Species']
y_input=temp[0:106]
#test data
x_test=dataframe.ix[106:149,['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
y_test=temp[106:150]
C:\Users\jaini\Anaconda3\lib\site-packages\ipykernel_launcher.py:24: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix

Edit the hyper parameter learning_rate =0.1 and set training_epochs= 100 with remaining code resued from the initial model.

In [33]:
epoch_list=[]
train_accuracy=[]
train_loss=[]

start_time=time.time()
#Defining the Multiple Layer Perceptron
def model(x, weights, bias):
	layer_1 = tf.add(tf.matmul(x, weights["hidden"]), bias["hidden"])
	layer_1 = tf.nn.relu(layer_1)

	output_layer = tf.matmul(layer_1, weights["output"]) + bias["output"]
	return output_layer
# hyperparameter tuning
learning_rate=0.1
training_epochs=100
display_steps=10
# network parameters
n_input=4
n_hidden=10
n_output=3
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
#weights and biases
weights={
  "hidden":tf.Variable(tf.random_normal([n_input,n_hidden],name="weight_hidden")),
  "output" : tf.Variable(tf.random_normal([n_hidden, n_output]), name="weight_output")
}
bias = {
	"hidden" : tf.Variable(tf.random_normal([n_hidden]), name="bias_hidden"),
	"output" : tf.Variable(tf.random_normal([n_output]), name="bias_output")
}
#Define Model
pred = model(X, weights, bias)
#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
#Initialize Global Variables
init = tf.global_variables_initializer()
start_time=time.time()
with tf.Session() as sess:
  sess.run(init)
  for epoch in range(training_epochs):
      _, c = sess.run([optimizer, cost], feed_dict={X: x_input, Y:[t for t in y_input.as_matrix()]})
      if(epoch + 1) % display_steps == 0:
        print("Epoch: ", (epoch+1), "Cost: ", c)
#   print("Optimization Finished!")
        test_result = sess.run(pred, feed_dict={X: x_input})
        correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
        accuracy_final=accuracy.eval({X: x_input, Y:[t for t in y_input.as_matrix()]})
#   print "Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]})
        print ("Accuracy:", accuracy_final)
        epoch_list.append(epoch)
        train_loss.append(c)
        train_accuracy.append(accuracy_final)

  
  test_result = sess.run(pred, feed_dict={X: x_test})
  correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
  print ("Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]}))
   
plot_loss()
plot_accuracy() 
end_time = time.time()
print ("Completed in ", end_time - start_time , " seconds")
Epoch:  10 Cost:  2.040994
Accuracy: 0.6792453
Epoch:  20 Cost:  0.5986499
Accuracy: 0.8584906
Epoch:  30 Cost:  0.2666049
Accuracy: 0.7830189
Epoch:  40 Cost:  0.2379817
Accuracy: 0.9245283
Epoch:  50 Cost:  0.14016463
Accuracy: 0.9622642
Epoch:  60 Cost:  0.11789291
Accuracy: 0.9528302
Epoch:  70 Cost:  0.107266
Accuracy: 0.9622642
Epoch:  80 Cost:  0.09728856
Accuracy: 0.9716981
Epoch:  90 Cost:  0.08983598
Accuracy: 0.990566
Epoch:  100 Cost:  0.08443507
Accuracy: 0.990566
Accuracy Test: 0.97727275
Completed in  9.223546028137207  seconds

Final Observation Number of Epochs and Learning Rate:

In the first try as mentioned above I had a Learning Rate of 0.1 and number of epochs as 1000. Hence, I tried to tune these 2 parameters. It was observed that I reached a similar accuracy at a learning rate of 0.1 and number of epochs as 100. This also better as the Training accuracy is 99.05% and testing accuracy is 97.72%.

Training Accuracy =99.06% Testing accuracy= 97.72%

There is defintely an improvement in the training set performance. More over by reducing the number of epochs and increasing the learning rate helped computationally, Hence, reducing the steps to learn did not affect the accuracy in the case of the Iris Dataset.

Hyper Parameter Tuning MLP Neural Network

Next change the optimizer to the Adagrad optimizer. Replace the AdamOptimizer with the Adagrad Optimizer

In [35]:
epoch_list=[]
train_accuracy=[]
train_loss=[]

start_time=time.time()
#Defining the Multiple Layer Perceptron
def model(x, weights, bias):
	layer_1 = tf.add(tf.matmul(x, weights["hidden"]), bias["hidden"])
	layer_1 = tf.nn.relu(layer_1)

	output_layer = tf.matmul(layer_1, weights["output"]) + bias["output"]
	return output_layer
# hyperparameter tuning
learning_rate=0.1
training_epochs=100
display_steps=10
# network parameters
n_input=4
n_hidden=10
n_output=3
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
#weights and biases
weights={
  "hidden":tf.Variable(tf.random_normal([n_input,n_hidden],name="weight_hidden")),
  "output" : tf.Variable(tf.random_normal([n_hidden, n_output]), name="weight_output")
}
bias = {
	"hidden" : tf.Variable(tf.random_normal([n_hidden]), name="bias_hidden"),
	"output" : tf.Variable(tf.random_normal([n_output]), name="bias_output")
}
#Define Model
pred = model(X, weights, bias)
#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
optimizer = tf.train.AdagradOptimizer(learning_rate).minimize(cost)
#Initialize Global Variables
init = tf.global_variables_initializer()
start_time=time.time()
with tf.Session() as sess:
  sess.run(init)
  for epoch in range(training_epochs):
      _, c = sess.run([optimizer, cost], feed_dict={X: x_input, Y:[t for t in y_input.as_matrix()]})
      if(epoch + 1) % display_steps == 0:
        print("Epoch: ", (epoch+1), "Cost: ", c)
#   print("Optimization Finished!")
        test_result = sess.run(pred, feed_dict={X: x_input})
        correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
        accuracy_final=accuracy.eval({X: x_input, Y:[t for t in y_input.as_matrix()]})
#   print "Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]})
        print ("Accuracy:", accuracy_final)
        epoch_list.append(epoch)
        train_loss.append(c)
        train_accuracy.append(accuracy_final)

  
  test_result = sess.run(pred, feed_dict={X: x_test})
  correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
  print ("Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]}))
   
plot_loss()
plot_accuracy() 
end_time = time.time()
print ("Completed in ", end_time - start_time , " seconds")
Epoch:  10 Cost:  0.37035173
Accuracy: 0.8584906
Epoch:  20 Cost:  0.31678358
Accuracy: 0.8867925
Epoch:  30 Cost:  0.27425855
Accuracy: 0.8962264
Epoch:  40 Cost:  0.27360934
Accuracy: 0.8867925
Epoch:  50 Cost:  0.21820909
Accuracy: 0.9433962
Epoch:  60 Cost:  0.19785126
Accuracy: 0.9433962
Epoch:  70 Cost:  0.18199822
Accuracy: 0.9433962
Epoch:  80 Cost:  0.16908248
Accuracy: 0.9528302
Epoch:  90 Cost:  0.15834013
Accuracy: 0.9528302
Epoch:  100 Cost:  0.14946839
Accuracy: 0.9622642
Accuracy Test: 0.97727275
Completed in  8.86793065071106  seconds

Observation

It is observed that the training accuracy decreased which could be attributed to random seed but the testing accuracy remains the same . hence, the Adagrad optimizer can be used as an option

Training Accuracy - 96.22% Test Accuracy - 97.72%

Hyper Parameter Tuning for MLP Neural Network

Reuse the code of the first model and replace the optimizer with AdadeltaOptimizer

In [37]:
epoch_list=[]
train_accuracy=[]
train_loss=[]

start_time=time.time()
#Defining the Multiple Layer Perceptron
def model(x, weights, bias):
	layer_1 = tf.add(tf.matmul(x, weights["hidden"]), bias["hidden"])
	layer_1 = tf.nn.relu(layer_1)

	output_layer = tf.matmul(layer_1, weights["output"]) + bias["output"]
	return output_layer
# hyperparameter tuning
learning_rate=0.1
training_epochs=100
display_steps=10
# network parameters
n_input=4
n_hidden=10
n_output=3
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
#weights and biases
weights={
  "hidden":tf.Variable(tf.random_normal([n_input,n_hidden],name="weight_hidden")),
  "output" : tf.Variable(tf.random_normal([n_hidden, n_output]), name="weight_output")
}
bias = {
	"hidden" : tf.Variable(tf.random_normal([n_hidden]), name="bias_hidden"),
	"output" : tf.Variable(tf.random_normal([n_output]), name="bias_output")
}
#Define Model
pred = model(X, weights, bias)
#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
optimizer = tf.train.AdadeltaOptimizer(learning_rate).minimize(cost)
#Initialize Global Variables
init = tf.global_variables_initializer()
start_time=time.time()
with tf.Session() as sess:
  sess.run(init)
  for epoch in range(training_epochs):
      _, c = sess.run([optimizer, cost], feed_dict={X: x_input, Y:[t for t in y_input.as_matrix()]})
      if(epoch + 1) % display_steps == 0:
        print("Epoch: ", (epoch+1), "Cost: ", c)
#   print("Optimization Finished!")
        test_result = sess.run(pred, feed_dict={X: x_input})
        correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
        accuracy_final=accuracy.eval({X: x_input, Y:[t for t in y_input.as_matrix()]})
#   print "Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]})
        print ("Accuracy:", accuracy_final)
        epoch_list.append(epoch)
        train_loss.append(c)
        train_accuracy.append(accuracy_final)

  
  test_result = sess.run(pred, feed_dict={X: x_test})
  correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
  print ("Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]}))
   
plot_loss()
plot_accuracy() 
end_time = time.time()
print ("Completed in ", end_time - start_time , " seconds")
Epoch:  10 Cost:  35.242317
Accuracy: 0.3301887
Epoch:  20 Cost:  35.17636
Accuracy: 0.3301887
Epoch:  30 Cost:  35.10967
Accuracy: 0.3301887
Epoch:  40 Cost:  35.042065
Accuracy: 0.3301887
Epoch:  50 Cost:  34.974228
Accuracy: 0.3301887
Epoch:  60 Cost:  34.906036
Accuracy: 0.3301887
Epoch:  70 Cost:  34.837196
Accuracy: 0.3301887
Epoch:  80 Cost:  34.76803
Accuracy: 0.3301887
Epoch:  90 Cost:  34.69819
Accuracy: 0.3301887
Epoch:  100 Cost:  34.62768
Accuracy: 0.3301887
Accuracy Test: 0.3409091
Completed in  9.821969747543335  seconds

Observation:

It is observed that this affected the performance of the model and the network did not improve its performance.

Next , follow the same steps above and replace the optimizer to the Stochastic Gradient Descent Optimizer. You will reach the following results or a result close to the below results due to Random Seed

Testing Accuracy gradientdescent=86.36%

Adadelta=47.72%

Adagrad=97.72%

Final Observation tuning Gradient Estimation

Hence, the best would be Adamoptimizer or the Adagrad Optimizer as these optimizers clearly oput performed the other optimizers

Hyper Parameter Tuning for MLP

Activation Function

Now we will train the model for the following Activation Functions :

  1. Sigmoid
  2. Relu6
  3. Tanh

Reusing the code from the intial model , replace the activation tf.nn.relu with tf.nn.Sigmoid. All other parameters must be set to the intial model parameters.

Below is the a sample when I ran with activation function Sigmoid.

In [43]:
epoch_list=[]
train_accuracy=[]
train_loss=[]

start_time=time.time()
#Defining the Multiple Layer Perceptron
def model(x, weights, bias):
	layer_1 = tf.add(tf.matmul(x, weights["hidden"]), bias["hidden"])
	layer_1 = tf.nn.sigmoid(layer_1)

	output_layer = tf.matmul(layer_1, weights["output"]) + bias["output"]
	return output_layer
# hyperparameter tuning
learning_rate=0.1
training_epochs=100
display_steps=10
# network parameters
n_input=4
n_hidden=10
n_output=3
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
#weights and biases
weights={
  "hidden":tf.Variable(tf.random_normal([n_input,n_hidden],name="weight_hidden")),
  "output" : tf.Variable(tf.random_normal([n_hidden, n_output]), name="weight_output")
}
bias = {
	"hidden" : tf.Variable(tf.random_normal([n_hidden]), name="bias_hidden"),
	"output" : tf.Variable(tf.random_normal([n_output]), name="bias_output")
}
#Define Model
pred = model(X, weights, bias)
#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
#Initialize Global Variables
init = tf.global_variables_initializer()
start_time=time.time()
with tf.Session() as sess:
  sess.run(init)
  for epoch in range(training_epochs):
      _, c = sess.run([optimizer, cost], feed_dict={X: x_input, Y:[t for t in y_input.as_matrix()]})
      if(epoch + 1) % display_steps == 0:
        print("Epoch: ", (epoch+1), "Cost: ", c)
#   print("Optimization Finished!")
        test_result = sess.run(pred, feed_dict={X: x_input})
        correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
        accuracy_final=accuracy.eval({X: x_input, Y:[t for t in y_input.as_matrix()]})
#   print "Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]})
        print ("Accuracy:", accuracy_final)
        epoch_list.append(epoch)
        train_loss.append(c)
        train_accuracy.append(accuracy_final)

  
  test_result = sess.run(pred, feed_dict={X: x_test})
  correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(Y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
  print ("Accuracy Test:", accuracy.eval({X: x_test, Y:[t for t in y_test.as_matrix()]}))
   
plot_loss()
plot_accuracy() 
end_time = time.time()
print ("Completed in ", end_time - start_time , " seconds")
Epoch:  10 Cost:  0.54932
Accuracy: 0.6792453
Epoch:  20 Cost:  0.38948652
Accuracy: 0.9245283
Epoch:  30 Cost:  0.22740783
Accuracy: 0.9811321
Epoch:  40 Cost:  0.11654143
Accuracy: 0.9811321
Epoch:  50 Cost:  0.078980714
Accuracy: 0.9811321
Epoch:  60 Cost:  0.06591436
Accuracy: 0.9716981
Epoch:  70 Cost:  0.060725108
Accuracy: 0.9716981
Epoch:  80 Cost:  0.058212988
Accuracy: 0.9811321
Epoch:  90 Cost:  0.05667977
Accuracy: 0.9811321
Epoch:  100 Cost:  0.055561863
Accuracy: 0.9811321
Accuracy Test: 1.0
Completed in  12.254695892333984  seconds

Observation: It is observed that the sigmoid activation function provided a training accuracy of 98.11 and testing accuracy of 100%. Though this is a good result I would consider an average of the accuracy in attribution to the Random seed.

Theabove steps can be repeated for Tanh and Relu6 activation function. After performing these you should reach an accuracy similar to the below accuracy.

It is observed that tanh provided an accuracy of 50%, relu6 provided 61.36 and sigmoid=97.72%.

Final Observation for Tuning Activation Function

It is observed that Relu and sigmoid performed the best. It is observed that both can be taken into consideration while tuning activation functions to improve accuracy.

Concluding the Hyper Parameter Tuning of MLP

  1. It is observed that tweeking the number of epochs and learning rate was an effective combination to tune. Reducing it to 100 and 0.1 epochs provided stability to the training and testing set. Though we must take into account the random seed generation

  2. The Optimizer Adam and Adagrad worked the best and can be used for tuning

  3. The Activation function Sigmoid and Relu worked the best as well

As a conclusion prospective hyper parameters to tune to improve the performance of the MLP Model are Activation, Optimization and a combination of number of epochs and leanring rate.

Summary

image.png

Convolutional Neural Network

CNN are popularly used for image and video recognition. It is a deep feed forward neural network used for used for visual imagery. CNN's use very less pre processing , that is the algorithm learns from the filters in contrary to traditional algorithms. This is big advantage .

A CNN consists of a convolutional layer followed by an pooling layer is then fed with non linear activation function. The convolutional layer consists of a filters that learn every pixel of image. This is the applied to pooling which is a kind of down sampling. It tries to identify all the rough area of image over the exact area of the image to reduce parameters and computation in the network

Structure of the CNN Model Used

This Neural Network consists 2 Convolutional Layers. Each Convolutional layer has an activation function sigmoid and an Average Pooling layer. The 2 Convolutional layers are preceded by the flattened layer and the fully connected layer. The model is a LENET model. It uses the Softmax Cross Entropy cost function and optimizer as Stochastic Gradient Descent. The model is initialized for 20000 epochs

Dataset Description

image.png

The Cifar 10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is majorly used for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks. There are 6000 images for each class. This hello world dataset is popularly used for Object Detection and Recognition.

Steps to Use the Data Set

  1. Download the dataset from the following website : https://www.cs.toronto.edu/~kriz/cifar.html
  2. Select the download link named 'CIFAR-10 python version'
  3. The folder must unzipped and untarred
  4. Navigate to cifar-10-batches-py folder and copy all files
  5. Paste these in the Jupyter Notebook path as mentioned in the first MLP Section

Processing the CIFAR10 Data

The Cifar 10 dataset consists of images and their corresponding labels. There are 10 labels. First we will write functions to preprocess the dataset.

  1. randomize Function : This function is used to shuffle the images and images
  2. one_hot_encode Function : This function is used to convert the 10 labels to one hot encoded arrays for each label
  3. reformat_data Function: This function is used to reshape the images according to the selected dimensions which is image_width,image_height,image_depth
  4. flatten_tf_array: this used to flatten the tensorflow array
  5. accuracy : This function is used to calculate accuracy
In [10]:
# function is randomize the dataset and shuffle
def randomize(dataset, labels):
    permutation = np.random.permutation(labels.shape[0])
    shuffled_dataset = dataset[permutation, :, :]
    shuffled_labels = labels[permutation]
    return shuffled_dataset, shuffled_labels

# one hot encode each pixel 
def one_hot_encode(np_array):
    return (np.arange(10) == np_array[:,None]).astype(np.float32)

# reformating the data for the image_width,image_height,image_depth
def reformat_data(dataset, labels, image_width, image_height, image_depth):
    np_dataset_ = np.array([np.array(image_data).reshape(image_width, image_height, image_depth) for image_data in dataset])
    np_labels_ = one_hot_encode(np.array(labels, dtype=np.float32))
    np_dataset, np_labels = randomize(np_dataset_, np_labels_)
    return np_dataset, np_labels

# Flattening the array 
def flatten_tf_array(array):
    shape = array.get_shape().as_list()
    return tf.reshape(array, [shape[0], shape[1] * shape[2] * shape[3]])

# Calculating accuracy 
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

Reading and Preprocessing the Data

Since, the data files are in the Jupyter Notebook path they will be read automatically Here , we read the images from the train and test set and reformat the images based on the dimensions selected. The dimensions selected are as follows :

c10_image_height = 32

c10_image_width = 32

c10_image_depth = 3

c10_num_labels = 10

Follow the comments to understand the code below

In [11]:
# Code attribution by https://github.com/taspinar/sidl
import pickle
import numpy as np

# This ensures that the code always reads the jupyter notebook path
cifar10_folder = './'
train_datasets = ['data_batch_1', 'data_batch_2', 'data_batch_3', 'data_batch_4', 'data_batch_5', ]
test_dataset = ['test_batch']

# images dimensions of CIFAR 10
c10_image_height = 32
c10_image_width = 32
c10_image_depth = 3
# there are 10 classes , hence 10 labels
c10_num_labels = 10
 
#Reading each image from train and test batches  
with open(cifar10_folder + test_dataset[0], 'rb') as f0:
    c10_test_dict = pickle.load(f0, encoding='bytes')
 
c10_test_dataset, c10_test_labels = c10_test_dict[b'data'], c10_test_dict[b'labels']
test_dataset_cifar10, test_labels_cifar10 = reformat_data(c10_test_dataset, c10_test_labels, c10_image_width, c10_image_height, c10_image_depth)
 
# images saved in train_dataset and labels saved in train_labels   
c10_train_dataset, c10_train_labels = [], []
for train_dataset in train_datasets:
    with open(cifar10_folder + train_dataset, 'rb') as f0:
        c10_train_dict = pickle.load(f0, encoding='bytes')
        c10_train_dataset_, c10_train_labels_ = c10_train_dict[b'data'], c10_train_dict[b'labels']
 
        c10_train_dataset.append(c10_train_dataset_)
        c10_train_labels += c10_train_labels_
# reformatting the training data 
c10_train_dataset = np.concatenate(c10_train_dataset, axis=0)
train_dataset_cifar10, train_labels_cifar10 = reformat_data(c10_train_dataset, c10_train_labels, c10_image_width, c10_image_height, c10_image_depth)
del c10_train_dataset
del c10_train_labels
 
print("The training set contains the following labels: {}".format(np.unique(c10_train_dict[b'labels'])))
print('Training set shape', train_dataset_cifar10.shape, train_labels_cifar10.shape)
print('Test set shape', test_dataset_cifar10.shape, test_labels_cifar10.shape)
The training set contains the following labels: [0 1 2 3 4 5 6 7 8 9]
Training set shape (50000, 32, 32, 3) (50000, 10)
Test set shape (10000, 32, 32, 3) (10000, 10)

Exploratory Data Analysis

The code in the in this section namely Exploratory Data Analysis by Magnus Erik Hvass Pedersen is licensed under the MIT License

Lets look at each function that is being used visualize the CIFAR 10 Data. All functions beloew are used to visualize the data. FOllow the comments to understand each function

In [29]:
# Functions
#The code in the in this section namely Exploratory Data Analysis by Magnus Erik Hvass Pedersen is licensed under the MIT License
import math
import os
%matplotlib inline
import matplotlib.pyplot as plt

# Path where the CIFAR 10 files are present
data_path = "./"

# Various constants for the size of the images.
# Use these constants in your own program.

# Width and height of each image.
img_size = 32

# Number of channels in each image, 3 channels: Red, Green, Blue.
num_channels = 3

# Length of an image when flattened to a 1-dim array.
img_size_flat = img_size * img_size * num_channels

# Number of classes.
num_classes = 10

# Various constants used to allocate arrays of the correct size.

# Number of files for the training-set.
_num_files_train = 5

# Number of images for each batch-file in the training-set.
_images_per_file = 10000

# Total number of images in the training-set.
# This is used to pre-allocate arrays for efficiency.
_num_images_train = _num_files_train * _images_per_file


# This function is used to unpickle the Test and Train files and load the data. It reads each byte of the image
def _unpickle(filename):
    """
    Unpickle the given file and return the data.
    Note that the appropriate dir-name is prepended the filename.
    """

    # Create full path for the file.
    file_path = _get_file_path(filename)

    print("Loading data: " + file_path)

    with open(file_path, mode='rb') as file:
        # In Python 3.X it is important to set the encoding,
        # otherwise an exception is raised here.
        data = pickle.load(file, encoding='bytes')

    return data

# This function takes the raw image , reshapes it into arrays and returns the image
def _convert_images(raw):
    """
    Convert images from the CIFAR-10 format and
    return a 4-dim array with shape: [image_number, height, width, channel]
    where the pixels are floats between 0.0 and 1.0.
    """

    # Convert the raw images from the data-files to floating-points.
    raw_float = np.array(raw, dtype=float) / 255.0

    # Reshape the array to 4-dimensions.
    images = raw_float.reshape([-1, num_channels, img_size, img_size])

    # Reorder the indices of the array.
    images = images.transpose([0, 2, 3, 1])

    return images

# this function is used to load the data , unpickle and reshape the images
def _load_data(filename):
    """
    Load a pickled data-file from the CIFAR-10 data-set
    and return the converted images (see above) and the class-number
    for each image.
    """

    # Load the pickled data-file.
    data = _unpickle(filename)

    # Get the raw images.
    raw_images = data[b'data']

    # Get the class-numbers for each image. Convert to numpy-array.
    cls = np.array(data[b'labels'])

    # Convert the images.
    images = _convert_images(raw_images)

    return images, cls

# This function is used to convert the images into one hot encoded values
def one_hot_encoded_labels(class_numbers, num_classes=None):
    if num_classes is None:
        num_classes = np.max(class_numbers) + 1

    return np.eye(num_classes, dtype=float)[class_numbers]

# we will be using the test batch to visualize the images , hence we read the test batch.
#This function then loads the images as one hot encoded values 
def load_test_data():
    """
    Load all the test-data for the CIFAR-10 data-set.
    Returns the images, class-numbers and one-hot encoded class-labels.
    """

    images, cls = _load_data(filename="test_batch")

    return images, cls, one_hot_encoded_labels(class_numbers=cls, num_classes=num_classes)

# Get the path of file currently present
def _get_file_path(filename=""):
    """
    Return the full path of a data-file for the data-set.
    If filename=="" then return the directory of the files.
    """

    return os.path.join(data_path, filename)

# Load class names. This function assigns the class name to each image
def load_class_names():
    """
    Load the names for the classes in the CIFAR-10 data-set.
    Returns a list with the names. Example: names[3] is the name
    associated with class-number 3.
    """

    # Load the class-names from the pickled file.
    raw = _unpickle(filename="batches.meta")[b'label_names']

    # Convert from binary strings.
    names = [x.decode('utf-8') for x in raw]

    return names
In [30]:
# Lets load the test batch using the load_test_data function that will retriece the image, labele and the array image
images_test, cls_test, labels_test = load_test_data()
Loading data: ./test_batch
In [31]:
# This function ised used to plt the image with the desired the image dimensions.
def plot_images(images, cls_true, cls_pred=None, smooth=True):

    assert len(images) == len(cls_true) == 9

    # Create figure with sub-plots.
    fig, axes = plt.subplots(3, 3)

    # Adjust vertical spacing if we need to print ensemble and best-net.
    if cls_pred is None:
        hspace = 0.3
    else:
        hspace = 0.6
    fig.subplots_adjust(hspace=hspace, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Interpolation type.
        if smooth:
            interpolation = 'spline16'
        else:
            interpolation = 'nearest'

        # Plot image.
        ax.imshow(images[i, :, :, :],
                  interpolation=interpolation)
            
        # Name of the true class.
        cls_true_name = class_names[cls_true[i]]

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true_name)
        else:
            # Name of the predicted class.
            cls_pred_name = class_names[cls_pred[i]]

            xlabel = "True: {0}\nPred: {1}".format(cls_true_name, cls_pred_name)

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)
        
        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
    
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()
In [33]:
# lets have a look at the classnames
class_names=load_class_names()
class_names
Loading data: ./batches.meta
Out[33]:
['airplane',
 'automobile',
 'bird',
 'cat',
 'deer',
 'dog',
 'frog',
 'horse',
 'ship',
 'truck']

Finally lets try and visalize the image in the test set

In [34]:
# Get the first images from the test-set.
images = images_test[0:9]

# Get the true classes for those images.
cls_true = cls_test[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true, smooth=False)

The above code can lso be used to visualize the number of wrong image labels predicted by the algorithm and the true labesl for visualizing. Currently not present.

Building the Deep Neural Network

Convoluted Neural Network

CIFAR 10 has 10 classes of about 60000 images. The images can be classified into 10 classes 'airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck'. This dataset takes extremely long to train since there are around 60000 images. Of the various references take it is observed that it takes approximately 150000 epochs to get CIFAR10 ti an accuracy of 80%. Since, I did not have the computing power I trained the CNN for 20000 epochs which took me approximately 8 hours. In order to test various Hyper parameters I have used a benchmark of 7000 epochs as the CNN model with a Sigmoid Activation took 7000 epochs to get to accuracy of 42 %.

This CNN model has 2 Convloluted layers with the activation functions of Sigmoid and the pooling layer formed by Average Pooling. Followed by the 2 Convoluted layers is the layer for flattening followed by the fully connected layer. This also uses a activation of Sigmoid. The classification is done using Softmax. The optimizer used is Gradient Descent Optimizer. The Loss is calculted as softmax_cross_entropy_with_logits. The learning rate used is 0.5 The accuracy received is after training the model for 20000 epochs is 56% for training and 48% as testing accuracy. Time to train is almost 10 hours without any support from a GPU.

Code for Initial Model CNN using Tensorflow - LENET-5

Now lets code the model. We first define the image dimensions. This is required for the filters as they must scan through the images. Next we create function that will create the weights and bias for each of the layers. Next the function model_lenet5 assigns the weights and bias for each of the layers. This model is known as LENET-5 model.

We will use the values obtained at the 700o epoch as the bench marck for comparing the effect of various hyper parameters tuned

In [6]:
import tensorflow as tf
# assign the dimensions
LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    #Network Initialization
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    #first convolution layer, followed by the pooling layer and the activation function
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    
    #Second convolution layer, followed by the pooling layer and the activation function. It takes the first layer as input
    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    
    #Flat layer followed by the fully connected layer. It takes the second convolution layer layer as input
    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    # Last layer is the fully connected layer that rovides the output
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits
C:\Users\jaini\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Next, we assign the various hyper parameters like number of epochs defined as num_steps, display_steps, learning_rate and batch size, loss and optimizer

In [7]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
num_steps = 20001
display_step = 200
learning_rate = 0.5
#batch size
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))
WARNING:tensorflow:From <ipython-input-7-f631688a48d2>:35: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Lets run the tensorflow session.Since we have selected a batch size, there will be a calculation involved in providing a batch size to the network.

In [37]:
### running the tensorflow session
with tf.Session(graph=graph) as session:
### initialize all variables before the tensor flow graph is run
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        # creating batches
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
# display the train and test predictions     
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
#evaluate the test predictions
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            message = "step {:04d} : loss is {:06.2f}, accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is 002.69, accuracy on training set 0.00 %, accuracy on test set 10.00 %
step 0200 : loss is 002.31, accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 0400 : loss is 002.31, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 0600 : loss is 002.30, accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 0800 : loss is 002.30, accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 1000 : loss is 002.30, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 1200 : loss is 002.27, accuracy on training set 15.62 %, accuracy on test set 13.70 %
step 1400 : loss is 002.28, accuracy on training set 14.06 %, accuracy on test set 15.62 %
step 1600 : loss is 002.20, accuracy on training set 15.62 %, accuracy on test set 16.72 %
step 1800 : loss is 002.08, accuracy on training set 23.44 %, accuracy on test set 20.49 %
step 2000 : loss is 002.16, accuracy on training set 15.62 %, accuracy on test set 19.62 %
step 2200 : loss is 002.07, accuracy on training set 23.44 %, accuracy on test set 22.78 %
step 2400 : loss is 002.09, accuracy on training set 15.62 %, accuracy on test set 23.45 %
step 2600 : loss is 002.02, accuracy on training set 23.44 %, accuracy on test set 24.22 %
step 2800 : loss is 001.97, accuracy on training set 37.50 %, accuracy on test set 26.72 %
step 3000 : loss is 001.98, accuracy on training set 21.88 %, accuracy on test set 29.14 %
step 3200 : loss is 001.95, accuracy on training set 28.12 %, accuracy on test set 30.47 %
step 3400 : loss is 002.04, accuracy on training set 26.56 %, accuracy on test set 31.34 %
step 3600 : loss is 001.75, accuracy on training set 40.62 %, accuracy on test set 33.41 %
step 3800 : loss is 001.68, accuracy on training set 35.94 %, accuracy on test set 32.33 %
step 4000 : loss is 001.99, accuracy on training set 25.00 %, accuracy on test set 35.13 %
step 4200 : loss is 001.80, accuracy on training set 35.94 %, accuracy on test set 34.20 %
step 4400 : loss is 001.77, accuracy on training set 37.50 %, accuracy on test set 34.92 %
step 4600 : loss is 001.85, accuracy on training set 28.12 %, accuracy on test set 36.95 %
step 4800 : loss is 001.76, accuracy on training set 39.06 %, accuracy on test set 34.17 %
step 5000 : loss is 001.75, accuracy on training set 39.06 %, accuracy on test set 37.31 %
step 5200 : loss is 001.78, accuracy on training set 35.94 %, accuracy on test set 36.27 %
step 5400 : loss is 001.69, accuracy on training set 35.94 %, accuracy on test set 37.35 %
step 5600 : loss is 001.65, accuracy on training set 50.00 %, accuracy on test set 37.96 %
step 5800 : loss is 001.70, accuracy on training set 50.00 %, accuracy on test set 36.89 %
step 6000 : loss is 001.68, accuracy on training set 29.69 %, accuracy on test set 40.10 %
step 6200 : loss is 001.53, accuracy on training set 45.31 %, accuracy on test set 37.39 %
step 6400 : loss is 001.66, accuracy on training set 40.62 %, accuracy on test set 40.01 %
step 6600 : loss is 001.73, accuracy on training set 25.00 %, accuracy on test set 39.62 %
step 6800 : loss is 001.68, accuracy on training set 39.06 %, accuracy on test set 40.36 %
step 7000 : loss is 001.69, accuracy on training set 42.19 %, accuracy on test set 42.72 %
step 7200 : loss is 001.50, accuracy on training set 50.00 %, accuracy on test set 41.09 %
step 7400 : loss is 001.36, accuracy on training set 45.31 %, accuracy on test set 36.28 %
step 7600 : loss is 001.53, accuracy on training set 51.56 %, accuracy on test set 41.34 %
step 7800 : loss is 001.72, accuracy on training set 43.75 %, accuracy on test set 42.44 %
step 8000 : loss is 001.69, accuracy on training set 35.94 %, accuracy on test set 41.65 %
step 8200 : loss is 001.79, accuracy on training set 37.50 %, accuracy on test set 43.68 %
step 8400 : loss is 001.39, accuracy on training set 53.12 %, accuracy on test set 43.82 %
step 8600 : loss is 001.54, accuracy on training set 48.44 %, accuracy on test set 41.55 %
step 8800 : loss is 001.76, accuracy on training set 39.06 %, accuracy on test set 41.50 %
step 9000 : loss is 001.57, accuracy on training set 45.31 %, accuracy on test set 44.09 %
step 9200 : loss is 001.52, accuracy on training set 40.62 %, accuracy on test set 43.73 %
step 9400 : loss is 001.60, accuracy on training set 35.94 %, accuracy on test set 43.08 %
step 9600 : loss is 001.61, accuracy on training set 43.75 %, accuracy on test set 42.47 %
step 9800 : loss is 001.55, accuracy on training set 39.06 %, accuracy on test set 45.20 %
step 10000 : loss is 001.40, accuracy on training set 56.25 %, accuracy on test set 45.28 %
step 10200 : loss is 001.57, accuracy on training set 39.06 %, accuracy on test set 45.06 %
step 10400 : loss is 001.46, accuracy on training set 43.75 %, accuracy on test set 44.62 %
step 10600 : loss is 001.61, accuracy on training set 42.19 %, accuracy on test set 43.12 %
step 10800 : loss is 001.67, accuracy on training set 45.31 %, accuracy on test set 45.61 %
step 11000 : loss is 001.83, accuracy on training set 32.81 %, accuracy on test set 43.95 %
step 11200 : loss is 001.54, accuracy on training set 46.88 %, accuracy on test set 41.88 %
step 11400 : loss is 001.46, accuracy on training set 45.31 %, accuracy on test set 45.62 %
step 11600 : loss is 001.66, accuracy on training set 35.94 %, accuracy on test set 46.26 %
step 11800 : loss is 001.30, accuracy on training set 56.25 %, accuracy on test set 45.63 %
step 12000 : loss is 001.56, accuracy on training set 45.31 %, accuracy on test set 45.72 %
step 12200 : loss is 001.42, accuracy on training set 43.75 %, accuracy on test set 45.58 %
step 12400 : loss is 001.72, accuracy on training set 43.75 %, accuracy on test set 45.08 %
step 12600 : loss is 001.69, accuracy on training set 40.62 %, accuracy on test set 46.14 %
step 12800 : loss is 001.55, accuracy on training set 43.75 %, accuracy on test set 44.23 %
step 13000 : loss is 001.61, accuracy on training set 48.44 %, accuracy on test set 44.36 %
step 13200 : loss is 001.46, accuracy on training set 48.44 %, accuracy on test set 45.06 %
step 13400 : loss is 001.31, accuracy on training set 51.56 %, accuracy on test set 45.96 %
step 13600 : loss is 001.44, accuracy on training set 45.31 %, accuracy on test set 45.32 %
step 13800 : loss is 001.50, accuracy on training set 48.44 %, accuracy on test set 46.61 %
step 14000 : loss is 001.51, accuracy on training set 46.88 %, accuracy on test set 46.63 %
step 14200 : loss is 001.41, accuracy on training set 46.88 %, accuracy on test set 45.85 %
step 14400 : loss is 001.68, accuracy on training set 35.94 %, accuracy on test set 45.59 %
step 14600 : loss is 001.52, accuracy on training set 53.12 %, accuracy on test set 45.22 %
step 14800 : loss is 001.68, accuracy on training set 42.19 %, accuracy on test set 44.10 %
step 15000 : loss is 001.38, accuracy on training set 54.69 %, accuracy on test set 46.48 %
step 15200 : loss is 001.18, accuracy on training set 54.69 %, accuracy on test set 47.66 %
step 15400 : loss is 001.37, accuracy on training set 56.25 %, accuracy on test set 47.01 %
step 15600 : loss is 001.58, accuracy on training set 46.88 %, accuracy on test set 46.41 %
step 15800 : loss is 001.63, accuracy on training set 43.75 %, accuracy on test set 48.10 %
step 16000 : loss is 001.89, accuracy on training set 32.81 %, accuracy on test set 46.96 %
step 16200 : loss is 001.34, accuracy on training set 53.12 %, accuracy on test set 49.01 %
step 16400 : loss is 001.33, accuracy on training set 53.12 %, accuracy on test set 47.07 %
step 16600 : loss is 001.37, accuracy on training set 56.25 %, accuracy on test set 44.77 %
step 16800 : loss is 001.42, accuracy on training set 50.00 %, accuracy on test set 47.92 %
step 17000 : loss is 001.19, accuracy on training set 60.94 %, accuracy on test set 48.59 %
step 17200 : loss is 001.27, accuracy on training set 54.69 %, accuracy on test set 47.66 %
step 17400 : loss is 001.49, accuracy on training set 48.44 %, accuracy on test set 47.83 %
step 17600 : loss is 001.40, accuracy on training set 46.88 %, accuracy on test set 47.81 %
step 17800 : loss is 001.30, accuracy on training set 53.12 %, accuracy on test set 49.11 %
step 18000 : loss is 001.60, accuracy on training set 42.19 %, accuracy on test set 48.69 %
step 18200 : loss is 001.56, accuracy on training set 43.75 %, accuracy on test set 48.51 %
step 18400 : loss is 001.49, accuracy on training set 45.31 %, accuracy on test set 46.38 %
step 18600 : loss is 001.43, accuracy on training set 43.75 %, accuracy on test set 47.89 %
step 18800 : loss is 001.70, accuracy on training set 43.75 %, accuracy on test set 47.67 %
step 19000 : loss is 001.12, accuracy on training set 60.94 %, accuracy on test set 48.63 %
step 19200 : loss is 001.44, accuracy on training set 46.88 %, accuracy on test set 46.01 %
step 19400 : loss is 001.44, accuracy on training set 50.00 %, accuracy on test set 47.77 %
step 19600 : loss is 001.21, accuracy on training set 56.25 %, accuracy on test set 49.35 %
step 19800 : loss is 001.32, accuracy on training set 50.00 %, accuracy on test set 48.12 %
step 20000 : loss is 001.45, accuracy on training set 56.25 %, accuracy on test set 48.13 %

Initial Observation of the model :

The maximum accuracy after training for 20000 epochs is 56.25% and the that of the testing set is 48.13%. Various hyper parameters were tuned to improve accuarcy. Learning rate was tried for 0.01, 0.0001 and 0.1. All these did not show desired results as the testing accuracy would improve very slowly. Hence, the learning rate of 0.5 worked out the best. There is much more scope for improving accuracy , more towards training the model for a larger number of epochs. As referenced earlier this would take around 15 hours to bring to 80% accuracy without computational assistance. Various Parameters have been tuned and explored below.

Since, I did not have the computing power I trained the CNN for 20000 epochs which took me approximately 8 hours. In order to test various Hyper parameters I have used a benchmark of 7000 epochs as the CNN model with a Sigmoid Activation took 7000 epochs to get to accuracy of 42 %.

NOTE : Below Sections onwards the parameters are trained for 7000 epochs . As observed above it can be seen that the neural network gradually rose to the accuracy of 42% at around 7000 epochs. This is being done as training the neural network takes extremely long.

Next lets dwelve into hyper parameter tuning to observe if the CNN performance can be improved. We will tuning activation functions, combination of loss and activation functions, number of epochs, gradient estimation, network architecture and network initialization

Hyper Parameter Tuning for CNN

Activation Functions

Observing any change in accuracy with change in activation functions. Since, training takes very long for 20000 epochs, the change in accuracy will be observed for 7001 epochs and we will notice any change in the network. I chose 7001 epochs as in the case of a sigmoid activation the accuracy reached 40%. Hence, I will use that as a benchmark and train the model with the below activation functions

Activation Functions that are considered are - tanh, relu

Lets strat by training the model with the activation of tanh. Reuse the code of the initial model and replace tf.nn.relu with tf.nn.tanh in the function named model_lenet5. Follow the commnets in the code to see the change in activation function

In [85]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
### Diffrent hyperparameters to tune

#change activation here to tanh. Replace tf.nn.relu
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.nn.tanh(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.nn.tanh(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.tanh(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.tanh(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Reuse the initial model code that was used to set the various hyper parameters, but set the num_steps as 7001. Follow the comments to see the change

In [86]:
from collections import defaultdict
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
#change num_steps to 7001
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64


graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code for the initializing the tensorflow session and run the code. Initlaize lists to store the training accuracy , testing accuracy and epochs displayed as list names train,test and display respectively

In [87]:
#Initlaize lists to store the training accuracy , testing accuracy and epochs
import pandas as pd
train=[]
test=[]
display=[]
### running the tensorflow session
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
# aplying loss and optimizer to the training set
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
# use the eval function to evelaute the accuracy of the test set
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
# append the values to the list to display the graphs of accuracy
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f}, accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is 002.47, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0200 : loss is 013.44, accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 0400 : loss is 011.02, accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0600 : loss is 014.55, accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 0800 : loss is 012.88, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 1000 : loss is 008.54, accuracy on training set 21.88 %, accuracy on test set 10.00 %
step 1200 : loss is 009.87, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 1400 : loss is 016.72, accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 1600 : loss is 016.35, accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1800 : loss is 015.69, accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 2000 : loss is 009.08, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 2200 : loss is 009.46, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 2400 : loss is 014.63, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 2600 : loss is 011.99, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 2800 : loss is 017.66, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 3000 : loss is 014.53, accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 3200 : loss is 011.43, accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3400 : loss is 011.98, accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3600 : loss is 016.66, accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 3800 : loss is 015.33, accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 4000 : loss is 017.34, accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 4200 : loss is 018.77, accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 4400 : loss is 011.53, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 4600 : loss is 010.68, accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 4800 : loss is 008.81, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 5000 : loss is 008.94, accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 5200 : loss is 007.52, accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 5400 : loss is 015.26, accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 5600 : loss is 008.69, accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 5800 : loss is 012.07, accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 6000 : loss is 013.03, accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 6200 : loss is 018.56, accuracy on training set 20.31 %, accuracy on test set 10.00 %
step 6400 : loss is 009.13, accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 6600 : loss is 011.55, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 6800 : loss is 011.87, accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 7000 : loss is 015.01, accuracy on training set 20.31 %, accuracy on test set 10.00 %

Lets plot the model accuracy for the testing and training set

In [88]:
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

OBSERVATION

It is observed that in the case of activation function Tanh , the network did not plateau. Moreover, the test accuracy did not even improve more than 10% through all the epochs. Hence, in the case the network did not plateau and this activation function was not suitable for he network.

Train Accuracy= 22% Test Accuracy = 10%

Hyper Parameter Tuning for CNN

Activation Function - Relu

Now lets follow the similar steps as above and this time replace tanh with Relu in th function name model_lenet5. Follow the comments to see the change made.

In [3]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
### Diffrent hyperparameters to tune
#activation={'tanh' :  tf.nn.tanh,'relu': tf.nn.relu, 'softplus' : tf.nn.softplus}
# change activation functions to relu in each instance of Sigmoid
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.nn.relu(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.nn.relu(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.relu(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.relu(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits
C:\Users\jaini\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Reuse the code to initial hyper parameters. Make sure number of epochs id 7001

In [4]:
from collections import defaultdict
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
#change number of epochs to 7001
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64


graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))
WARNING:tensorflow:From <ipython-input-4-7a354b10c60a>:36: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Initialize the tensor flow session and runeveluation on the training and testing set

In [6]:
train=[]
test=[]
display=[]
# init = tf.initialize_all_variables()
### running the tensorflow session
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:10.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is      71.60 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0200 : loss is       2.32 , accuracy on training set 3.12 %, accuracy on test set 10.00 %
step 0400 : loss is       2.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0600 : loss is       2.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0800 : loss is       2.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1000 : loss is       2.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1200 : loss is       2.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1400 : loss is       2.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 1600 : loss is       2.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1800 : loss is       2.31 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 2000 : loss is       2.29 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 2200 : loss is       2.31 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 2400 : loss is       2.30 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 2600 : loss is       2.31 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 2800 : loss is       2.32 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 3000 : loss is       2.29 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 3200 : loss is       2.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3400 : loss is       2.31 , accuracy on training set 3.12 %, accuracy on test set 10.00 %
step 3600 : loss is       2.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3800 : loss is       2.31 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 4000 : loss is       2.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 4200 : loss is       2.29 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 4400 : loss is       2.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 4600 : loss is       2.29 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 4800 : loss is       2.30 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 5000 : loss is       2.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 5200 : loss is       2.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 5400 : loss is       2.30 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 5600 : loss is       2.29 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 5800 : loss is       2.30 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 6000 : loss is       2.31 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 6200 : loss is       2.30 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 6400 : loss is       2.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 6600 : loss is       2.31 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 6800 : loss is       2.31 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 7000 : loss is       2.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
In [7]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Final Observation for Hyper Parameter tuning for Activation Function :

It is observed that similar to Tanh , there was no increase in Testing accuracy. Though it can be taken into consideration to run the CNN for larger number of epochs for sure. But Considering the use of Activation function Sigmoid provided an accuracy of 42% at 7000 epochs, Tanh and ReLu did not provide the same. Even in this case the network did not plateau and was not even close to plateau.

Train Accuracy = 14% Test Accuracy =10%

Next, we will try a combination of Loss function and Activation function

Hyperparameter Tuning Loss Function and Activation Function for CNN

Cost Function -- Hinge_Loss with Reduce Mean with activation function Tanh

Lets try a combination of Hinge Loss and Activation Function Tanh.

Hinge Loss is a loss used to evaluate classifiers . It is used as a maximum margin classification.

Use the initila model defined and replay the activation function in the model_lenet5 function to Tanh. Follow the comments observe the change in the below code

In [9]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
### Diffrent hyperparameters to tune, replace sigmoid with tanh in the below function for all occurences
# activation={'tanh' :  tf.nn.tanh,'relu': tf.nn.relu, 'softplus' : tf.nn.softplus}
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.nn.tanh(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.nn.tanh(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.tanh(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.tanh(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Now, we will change the cost function to accomadate hinge loss. Follow the comments to observe the change made.

In [11]:
from collections import defaultdict
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64


graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

# Chenge the softmax function to incorporate Hinge Loss
#then we compute the hinge loss between the logits and the (actual) labels
#     the loss is then reduced by tf.losses.mean_squared_error
    loss = tf.reduce_mean(tf.losses.hinge_loss(logits=logits, labels=tf_train_labels))

    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code to run the tensorflow session as per the initial mode. Ensure num_steps=7001

In [12]:
import pandas as pd
train=[]
test=[]
display=[]
### running the tensorflow session
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f}, accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is 001.80, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0200 : loss is 000.27, accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0400 : loss is 000.24, accuracy on training set 12.50 %, accuracy on test set 9.99 %
step 0600 : loss is 000.24, accuracy on training set 21.88 %, accuracy on test set 10.00 %
step 0800 : loss is 000.23, accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 1000 : loss is 000.23, accuracy on training set 7.81 %, accuracy on test set 10.29 %
step 1200 : loss is 000.22, accuracy on training set 25.00 %, accuracy on test set 15.50 %
step 1400 : loss is 000.21, accuracy on training set 20.31 %, accuracy on test set 19.11 %
step 1600 : loss is 000.23, accuracy on training set 6.25 %, accuracy on test set 14.24 %
step 1800 : loss is 000.22, accuracy on training set 9.38 %, accuracy on test set 19.67 %
step 2000 : loss is 000.22, accuracy on training set 7.81 %, accuracy on test set 10.08 %
step 2200 : loss is 000.21, accuracy on training set 10.94 %, accuracy on test set 24.43 %
step 2400 : loss is 000.22, accuracy on training set 18.75 %, accuracy on test set 15.81 %
step 2600 : loss is 000.21, accuracy on training set 17.19 %, accuracy on test set 19.94 %
step 2800 : loss is 000.22, accuracy on training set 15.62 %, accuracy on test set 21.62 %
step 3000 : loss is 000.21, accuracy on training set 26.56 %, accuracy on test set 14.06 %
step 3200 : loss is 000.21, accuracy on training set 15.62 %, accuracy on test set 17.69 %
step 3400 : loss is 000.21, accuracy on training set 25.00 %, accuracy on test set 27.08 %
step 3600 : loss is 000.21, accuracy on training set 25.00 %, accuracy on test set 24.44 %
step 3800 : loss is 000.21, accuracy on training set 25.00 %, accuracy on test set 22.19 %
step 4000 : loss is 000.21, accuracy on training set 20.31 %, accuracy on test set 22.76 %
step 4200 : loss is 000.21, accuracy on training set 18.75 %, accuracy on test set 27.81 %
step 4400 : loss is 000.21, accuracy on training set 31.25 %, accuracy on test set 26.64 %
step 4600 : loss is 000.21, accuracy on training set 17.19 %, accuracy on test set 25.63 %
step 4800 : loss is 000.21, accuracy on training set 32.81 %, accuracy on test set 23.02 %
step 5000 : loss is 000.21, accuracy on training set 29.69 %, accuracy on test set 28.80 %
step 5200 : loss is 000.21, accuracy on training set 25.00 %, accuracy on test set 26.93 %
step 5400 : loss is 000.21, accuracy on training set 23.44 %, accuracy on test set 29.44 %
step 5600 : loss is 000.21, accuracy on training set 32.81 %, accuracy on test set 27.59 %
step 5800 : loss is 000.21, accuracy on training set 26.56 %, accuracy on test set 25.90 %
step 6000 : loss is 000.21, accuracy on training set 26.56 %, accuracy on test set 29.40 %
step 6200 : loss is 000.21, accuracy on training set 34.38 %, accuracy on test set 32.52 %
step 6400 : loss is 000.21, accuracy on training set 20.31 %, accuracy on test set 20.55 %
step 6600 : loss is 000.21, accuracy on training set 20.31 %, accuracy on test set 30.98 %
step 6800 : loss is 000.20, accuracy on training set 37.50 %, accuracy on test set 24.20 %
step 7000 : loss is 000.22, accuracy on training set 9.38 %, accuracy on test set 27.33 %
In [13]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz%0AAAALEgAACxIB0t1+/AAAIABJREFUeJzsnXd4XNWZ8H9nRr13ybJsyb33gokpphlCFlgSOoSSACks%0AJF/CtyTZbCDZ5EtZlrDZDZBASCGEZlpCQjFggw3GuPduSVaxVUZ9RnV0vj/OvaOZ0ZQ7kkbFPr/n%0AmWdmbn1nNLrvfbuQUqLRaDSaMxfbSAug0Wg0mpFFKwKNRqM5w9GKQKPRaM5wtCLQaDSaMxytCDQa%0AjeYMRysCjUajOcPRikBz2iKEKBFCSCFEjIVtbxdCbBwOuTSa0YZWBJpRgRCiTAjRJYTI8Vu+w7iY%0Al4yMZD6ypAgh2oQQb460LBrNUKIVgWY0UQrcaL4RQswDkkZOnH58AegELhFCFAznia1YNRrNQNGK%0AQDOaeAa41ev9bcCfvDcQQqQLIf4khKgTQpQLIb4vhLAZ6+xCiIeFEPVCiOPA5wLs+zshxEkhRJUQ%0A4sdCCHsE8t0GPAHsBm7xO/YEIcQrhlwOIcT/eq27SwhxQAjRKoTYL4RYbCyXQoipXtv9QQjxY+P1%0AKiFEpRDiASHEKeD3QohMIcQbxjkajddFXvtnCSF+L4SoNta/ZizfK4S4wmu7WOM7WhTBZ9ecxmhF%0AoBlNfAKkCSFmGRfoG4A/+23zP0A6MBk4H6U47jDW3QX8E7AIWApc47fvH4AeYKqxzWrgTiuCCSGK%0AgVXAs8bjVq91duANoBwoAcYDzxvrrgUeMrZPA64EHFbOCRQAWUAxcDfq//X3xvuJQDvwv17bP4Oy%0AoOYAecAvjeV/wldxXQ6clFLusCiH5nRHSqkf+jHiD6AMuBj4PvBT4DJgLRADSNQF1g50AbO99vsK%0AsN54/T7wVa91q419Y4B8lFsn0Wv9jcA64/XtwMYQ8n0f2Gm8Hg+4gUXG+7OBOiAmwH5vA98IckwJ%0ATPV6/wfgx8brVcZnTQgh00Kg0Xg9DugFMgNsVwi0AmnG+zXAv47031w/Rs9D+x01o41ngA+BSfi5%0AhYAcIBZ1521Sjrowg7rgVfitMyk29j0phDCX2fy2D8WtwJMAUsoqIcQHKFfRDmACUC6l7Amw3wTg%0AmMVz+FMnpeww3wghklB3+ZcBmcbiVMMimQA0SCkb/Q8ipawWQnwEfEEI8SrwWeAbA5RJcxqiXUOa%0AUYWUshwVNL4ceMVvdT3Qjbqom0wEqozXJ1EXRO91JhUoiyBHSplhPNKklHPCySSE+AwwDfiuEOKU%0A4bM/C7jJCOJWABODBHQrgClBDu3CNxjuH4D2bw38bWAGcJaUMg04zxTROE+WECIjyLn+iHIPXQts%0AklJWBdlOcwaiFYFmNPJl4EIppdN7oZTSDbwI/EQIkWr47b9FXxzhReA+IUSRECIT+I7XvieBd4D/%0AEkKkCSFsQogpQojzLchzG8pNNRvljlkIzAUSUXfXn6KU0M+EEMlCiAQhxEpj36eA+4UQS4RiqiE3%0AwE6UMrELIS5DxTxCkYqKCzQJIbKAB/0+35vAY0ZQOVYIcZ7Xvq8Bi1GWgL+lpTnD0YpAM+qQUh6T%0AUm4NsvpewAkcBzYCfwGeNtY9ifLJ7wK209+iuBWIA/YDjShf+bhQsgghEoDrgP+RUp7yepSi3Fi3%0AGQrqClQQ+gRQCVxvfJaXgJ8YcraiLshZxuG/YezXBNxsrAvFoyjlU48KrL/lt/6LKIvpIFALfNNc%0AIaVsB15Gudz8vxfNGY6QUg+m0WjOBIQQPwCmSylvCbux5oxCB4s1mjMAw5X0ZZTVoNH4oF1DGs1p%0AjhDiLlQw+U0p5YcjLY9m9KFdQxqNRnOGoy0CjUajOcMZEzGCnJwcWVJSMqB9nU4nycnJQytQFBlL%0A8o4lWWFsyTuWZIWxJe9YkhUGJ++2bdvqpZS5YTcc6dJmK48lS5bIgbJu3boB7zsSjCV5x5KsUo4t%0AeceSrFKOLXnHkqxSDk5eYKu0cI3VriGNRqM5w9GKQKPRaM5wtCLQaDSaM5wxESwORHd3N5WVlXR0%0AdITcLj09nQMHDgyTVINnpORNSEigqKiI2NjYYT+3RqMZWcasIqisrCQ1NZWSkhK82gr3o7W1ldTU%0A1GGUbHCMhLxSShwOB5WVlUyaNGlYz63RaEaeMesa6ujoIDs7O6QS0FhDCEF2dnZY60qj0ZyejFlF%0AAGglMITo71KjOXMZ04pAo9FoxixH34OG4yMtBaAVwbCRkpICQHV1Nddc4z9TXbFq1Sq2b98e8jiP%0APvooLpfL8/7yyy+nqalp6ATVaDTDw5ovwcf/M9JSAFoRDDuFhYWsWbNmwPv7K4J//OMfZGQEm06o%0A0WhGJT1d0NEEHS0jLQmgFcGA+c53vsOvf/1rz/uHHnqIH//4x1x00UUsXryYefPm8frrr/fbr6ys%0AjLlz5wLQ3t7ODTfcwKxZs7j66qtpb2/3bPe1r32NpUuXMmfOHB58UE0k/NWvfkV1dTUXXHABF1xw%0AAQAlJSXU19cD8MgjjzB37lzmzp3Lo48+6jnfrFmzuOuuu5gzZw6rV6/2OY9GoxkB2hvUc5cz9HbD%0AxJhNH/Xmh3/bx/7qwJrV7XZjt9sjPubswjQevCL4XPPrr7+eb37zm9xzzz0AvPjii7z99tvcd999%0ApKWlUV9fz4oVK7jyyiuDBmIff/xxkpKSOHDgALt372bx4sWedT/5yU/IysrC7XZz0UUXsXv3bu67%0A7z4eeeQR1q1bR05Ojs+xtm3bxu9//3s2b96MlJKzzjqL888/n8zMTI4cOcJzzz3Hk08+yXXXXcfL%0AL7/MLbfoIVUazYjhVDdvdLWNrBwG2iIYIIsWLaK2tpbq6mp27dpFZmYmBQUFfO9732P+/PlcfPHF%0AVFVVUVNTE/QYH374oeeCPH/+fObPn+9Z9+KLL7J48WIWLVrEvn372L9/f0h5Nm7cyNVXX01ycjIp%0AKSl8/vOfZ8OGDQBMmjSJhQsXArBkyRLKysoG+ek1Gs2gcDnU8yhRBKeFRRDqzj2aBVrXXnsta9as%0A4dSpU1x//fU8++yz1NXVsW3bNmJjYykpKRlQbn5paSkPP/wwW7ZsITMzk9tvv31QOf7x8fGe13a7%0AXbuGNJqRxmVYBJ2jQxFoi2AQXH/99Tz//POsWbOGa6+9lubmZvLy8oiNjWXdunWUl5eH3P+8887j%0AL3/5CwB79+5l9+7dALS0tJCcnEx6ejo1NTW8+eabnn1SU1NpbW3td6xzzz2X1157DZfLhdPp5NVX%0AX+Xcc88dwk+r0WiGDJeOEZw2zJkzh9bWVsaPH8+4ceO4+eabueKKK5g3bx5Lly5l5syZIff/2te+%0Axh133MGsWbOYNWsWS5YsAWDBggUsWrSImTNnMmHCBFauXOnZ5+677+ayyy6jsLCQdevWeZYvXryY%0A22+/neXLlwNw5513smjRIu0G0mhGI6MsRqAVwSDZs2eP53VOTg6bNm0KuF1bm/qDl5SUsHfvXgAS%0AExN5/vnnfbYz7/b/8Ic/BDzOvffey7333ut5732h/9a3vsW3vvUtn+29zwdw//33h/lEGo0m6njH%0ACKSEEa7s164hjUajGW7MGIHshe6Rj9lpRaDRaDTDjWkRwKhwD2lFoNFoNMONUysCjUajObNxOSAx%0AU70eBSmkUVMEQogEIcSnQohdQoh9QogfGssfEkJUCSF2Go/LoyWDRqPRjDqkVIogo1i9HwUppNHM%0AGuoELpRStgkhYoGNQggzIf6XUsqHo3hujUajGZ10tkBvN2QWw8mdp7drSCrMTxhrPGS0zjfcNDU1%0A8dhjj0W8n5W20T/4wQ949913ByqaRqMZzZiBYo9FMPKKQEgZvWuzEMIObAOmAr+WUj4ghHgIuANo%0ABrYC35ZSNgbY927gboD8/Pwl/vn26enpTJ06NawMA206F47y8nKuu+46Nm/e7LO8p6eHmJiBG1rR%0AktcKR48epbm52fL2bW1tnjkLY4GxJO9YkhXGlrwjLWta8yEW7/hXDk/7KtOPPMHBGfdyatzFQbcf%0AjLwXXHDBNinl0rAbSimj/gAygHXAXCAfsKOskZ8AT4fbf8mSJdKf/fv391sWiJaWFkvbRcr1118v%0AExIS5IIFC+TSpUvlOeecI6+44go5bdo0KaWUV111lVy8eLGcPXu2/M1vfuPZr7i4WNbV1cnS0lI5%0Ac+ZMeeedd8rZs2fLSy65RLpcLtnS0iJvu+02+dJLL3m2/8EPfiAXLVok586dKw8cOCCllLK2tlZe%0AfPHFcvbs2fLLX/6ynDhxoqyrqxvUZ7L6nZqsW7duUOcbbsaSvGNJVinHlrwjLuvBN6V8ME3Kw2vV%0A86bHQ24+GHmBrdLCNXpYKoullE1CiHXAZdIrNiCEeBJ4Y9AnePM7cGpPwFWJ7h6wD+BjFsyDz/4s%0A6Oqf/exn7N27l507d7J+/Xo+97nPsXfvXiZNmgTA008/TVZWFu3t7SxbtowvfOELZGdn+xwjUHvo%0Aq666qt+5cnJy2L59O4899hgPP/wwTz31FD/84Q+58MIL+e53v8tbb73F7373u8g/o0ajGX7MYrJM%0A0zXUv3fYcBPNrKFcIUSG8ToRuAQ4KIQY57XZ1cDeQPuPNZYvX+5RAqCGyCxYsIAVK1ZQUVHBkSNH%0A+u1jtT305z//+X7bbNy4kRtuuAGAyy67jMzMzCH8NBrNGYqU8M73oXJb9M5hxghSx4Et9rTPGhoH%0A/NGIE9iAF6WUbwghnhFCLEQFjsuArwz6TCHu3Nuj2Ibam+TkZM/r9evX8+6777Jp0yaSkpJYtWpV%0AwDbSVttDm9vZ7XZ6enqGWHKNRuOh9aSaIywlFC2Jzjmc9WCPh7hkiE8ZFXUEUVMEUsrdwKIAy78Y%0ArXMOJ8HaQQM0NzeTmZlJUlISBw8e5JNPPhny869cuZIXX3yRBx54gHfeeYfGxn7xdo1GEyk1xgCo%0AtuADpQaNqwGSc1SjubiU094iOK3Jzs5m5cqVzJ07l8TERPLz8z3rLrvsMp544glmzZrFjBkzWLFi%0AxZCf/8EHH+TGG2/kmWee4eyzz6agoGBYLB+N5rSmdp96bj0VvXO46iEpS72OSxkVMQKtCAaBOVTG%0An/j4eJ9hMt6YPv6cnJyA7aFbW1t9WlB7xw2WLl3K+vXrAZU++/bbbxMTE8OmTZvYsmWLj6tJo9EM%0AgGGxCByQZMwcj0vWFoFm4Jw4cYLrrruO3t5e4uLiePLJJ0daJI1m7OOxCKKoCJz1fcVkp3uMQBNd%0Apk2bxo4dO0ZaDI3m9MHdA3WHwR4Hnc1qTkBs4tCfx4wRgHINtdUN/TkiZEx3H5VRrIo+09DfpeaM%0Ap+EYuDthohHTi0acoKdLKZkko6ZolMQIxqwiSEhIwOFw6AvYECClxOFwkJCQMNKiaDQjR43hFppy%0AoXqORpyg3Rha71EEOkYwKIqKiqisrKSuLrRZ1dHRMaYucCMlb0JCAkVFRcN+Xo1m1FC7H4QdJp2n%0A3kfDIjCH1puKQMcIBkdsbKxPJW8w1q9fz6JF/coZRi1jTV7Nac4//hXyZsLSL42sHD2d8PzNcMF3%0AYXyUCr1q9kP2lL5AbjQsArOq2DtG4O4EdzfYY4f+fBYZs4pAo9EMA7ufB2GD+TdAXNLIyVF3EI6u%0AhZxp0VMEtftg3EJIzAJbTHQsApefRRBndBXtauubWDYCjNkYgUajiTI9XdDRDO2NsPuFkZXFcUw9%0AV26JzvE726CxjK6cWVzy6Aba43OiZBEEiBHAiMcJtCLQaDSBcXrF3z55XPXfGSlMRXByl3ITDTV1%0ABwH4uDWfI7VtOERmdGMEiUZlcbxhEYxwnEArAo1GExhTEcy+CuoPwdH3Rk6WBkMRuLvg5O6hP76R%0AMfTUYZWoUSczouQackBCRl9rfI9rSFsEGo1mNGIqguV3Q0oBfBL5aNYhw3EUcqar19FwD9Xupycm%0AiY8cKSTF2al2p0NblGIEZqAYvBTByNYSaEWg0WgC01arntMKYfmdcOw9qD04MrI4jkHxZyB9QnQU%0AQc0+ysQE8tOSuGphIeVdqeruvadraM/jcvTFB0DHCDQazSjHtAiS82DJlyAmYWSsAleDKsTKngpF%0Ay4ZeEUhJz6m9bGkv5NbPFDMxK5mK7jS1zlk7tOdyejWcA4g3OgbrGIFGoxmVOOsgNkkFNJOzYf71%0AKnvI6RheORqOq2dTETRXQMvJoTt+Ww0xHY0ct03kpuUTKcxIoFZmqHVD3XzO5ehrQQ1eFoFWBBqN%0AZjTirPP1Z6/4OvR0wLanh1cOx1H1nDVFKQKAqq1Ddvjm8p0A5E9bQkZSHIUZidRKI6d/KOMEUipF%0AEDBGoBWBRqMZahqOw68WQXPlwI/RVqvcQiZ5M1Ufnk+fGnrfeSgcx1RRW2YJjJuvuoNWfDpkh9+1%0AdRMAF51/AQDj0r0tgiFUBJ0t0NvtGyOINYr0dIxAo9EMOZXblDI4uWvgx3DWQ3Ku77IV96i75H2v%0ADk6+SHAchYyJEBMHMfEwbgFUDo1F0Nnjprl8J032LCZNnAhAfloCDpGORAy4qKyts4eaFr855Z4+%0AQ14Wgc2mrAIdI9BoNENOS5V6HswdrbMWUvwUwdSLIGcGfPLr4Sswazim4gMmRcugeofqzzNI/rqz%0AmhJ3GTJ3tmdZrN1GTmoybfaB1xL8/M2DfP6xj327I/tXFZvEpWjXkEajiQKmIhhom4Te3sAWgRCw%0A4qvK0jixaXAyWkFK5RrKmtK3rGgZ9LT3tY0e8KElf9h4jOm2ajJKFvisG5eRgMOWOeDv71BNK1VN%0A7ZzytgrMPkPJ/oogWSsCjUYTBVqq1fNALYL2RpBu3xiByfwbVIO0Tb8euHxWaatRF0l/iwAGnUa6%0A6biD9pojxNOFyJ/js64wI5Ga3oFbBBUNLgB2Vzb3LTQ7j/pbBPEpOkag0WiiwGAtAk8NQU7/dXFJ%0AsOQOOPh3aCgd2PGtYvYYyp7ctyy9SFU6D1IRPL2xlKWJRhpq/myfdYXpCVR2pyEH8P119rg9lsAe%0Ab0UQKEYAp3eMQAiRIIT4VAixSwixTwjxQ2N5lhBirRDiiPE8cr1XNZrTlcFaBGYhVUoAiwBg+V1g%0As8Onvx3Y8a1ipo56WwRCQNHSQSmC0non7x2s5erxzSojKXemz/px6Ymc7E1XmVO97oiOXdXY7gmf%0A7K7yswjs8X21AyaneYygE7hQSrkAWAhcJoRYAXwHeE9KOQ14z3iv0WiGip6uvvYQg7YIcgOvTyuE%0AOVfD9mego2Vg57BCwzGVLpo+wXf5hOUqK8q8y46Q339USqzNxuL4asia3G9IvaolyEBId59LxyIn%0ADLfQ5Nxk9lQ29QWMzRoCIXx3OJ1jBFJhfrpY4yGBq4A/Gsv/CPxztGTQaM5I2k4BUvn3B3BHq47h%0A1V4iGCu+rpql7fjzgMS0hOMYZE5S1oc3njhB5Gmkzm7JS1sruXJhIfENByFvdr9tVHWx4ayI0Koy%0A4wOfmzeORlc3lY3taoV/VbHJKIgRRHVCmRDCDmwDpgK/llJuFkLkSynN+vBTQH6Qfe8G7gbIz89n%0A/fr1A5Khra1twPuOBGNJ3rEkK4wteQcja3rTfhYB9QnF5Dhr+ejdv9EdlxHRMSYd38pEbHzw6S7l%0AOgnCwvTZxH/wKG1zHo7Kd7vsxG7aE8ex1+/YNncn5wg7FR+tofRkZDO+1x5z0t4tWBxTiWwopSzt%0ALMr9jt/SKT1FZbs/XktDdoPl4390sJNYG2S4VDHf8+98zLKCGBadKsVtT2S337mm1DYyztXExiDf%0A33D8bqOqCKSUbmChECIDeFUIMddvvRRCBExGllL+FvgtwNKlS+WqVasGJMP69esZ6L4jwViSdyzJ%0ACmNL3kHJuqcedkLO/NWwbgsr501WFbmR0PIyNOSy6oILQ2+X9x148VYmdh1m/qr7ByZvMHp7YUMN%0AyQuvCvxdHJ1Hsb2G4gi+px53L99a/xZnT87ipuWxsEsyafnlTJrtewwpJY9uOAHA/El5sNj6OZ6v%0A2MbE7FZu+adz+dmWt+nNKGLVqpmwqwvGz+v/WXo/gso3WHX++f3dRgzP73ZYsoaklE3AOuAyoEYI%0AMQ7AeB7i9n4azRmOGSguXKyeBxInCFRDEIjpn4W4VHLqB9fyYdMxB9tPNPoubKlUg929A8XeFC2D%0Aqu0Rub7e3HuKhg7Jl8+ZpIbVA/iljgIIIbCnFag3EfYbOtHgYmJWEvExdmYUpPZlDrkagmRhJQMS%0Aul0RnWcoiWbWUK5hCSCESAQuAQ4CfwVuMza7DXg9WjJoNGckLdUQlwo5xgV0IJlDbbXWFEFMHExZ%0ARVbD1gFXGte2dvDlP27hlqc2c7zOK2jq3WwuEEXLVJC1zvqMhD9+XEZ+kuDCmXlQux9iElUPowDk%0AZKTTJlIi6kAqpaTCUAQA88ZnsLuyCdnTCZ3N/WsIoG9c5QjGCaJpEYwD1gkhdgNbgLVSyjeAnwGX%0ACCGOABcb7zUazVDRUqWyelIGdkcLGJ1HLSgCgGmXktDpgJq9kZ8HePTdI3T19BIXY+Oev+ygo9u4%0Aw/fUEASzCJaqZ4sN6I7WtrK1vJFVE2Kx2YSqTM6b2T8QbTAuI4E6MiL6/prbu2nt7GGCoQjmF6XT%0A0tFDVbVR1xFIEZgdSDtHbkpZNLOGdkspF0kp50sp50opf2Qsd0gpL5JSTpNSXiyltB6F0Wg04Wmp%0AVoogNgES0gfWU99ZF7yGwJ9pq9Xz4bcjPs3R2lZe2FLBLSuKeeS6BRw42cKP/264bBzHIDYZUgsC%0A75w1WV1YLWYOPf9pBbF2wcpCIzRaux/y+ruFTMZnJHLSnY6MwKIyU0cneCyCdACOlpWrDUIpgtPU%0AItBoNCNBSzWkjVevUwoitwi6nMpfHcif7Udvr2TN4W6aU6bAkXciFvXnbx0iMdbOvRdO5cKZ+dx9%0A3mT+/MkJ/r77pHINZU8OGEAFjMIyaxPLOnvcvLy9kktm55MWL1R6rLMuYHzAZFx6IjUyA3dL5IrA%0AdA1Nz08lLsZGVbXRDjxojIARrSXQikCjOZ1w96gLf7qhCFLzI7cIzGK0UDUEBp+UOrj/pV28070Q%0AWbmlr8OmBT4tbWDt/hq+tmoK2SnxAPzfS2ewcEIG33l5N911R4O7hUyKlkL9IdUbKQRr99fQ6Orm%0A+mWq1TS1RsO6/P41BCbmpDJbW43l+Ie/RRAXY2PWuDTqa0K4hsxxldoi0Gg0Q0LbKZC9yjUEA7MI%0AzKpiC66hT0vVhf/PLQsQsheOvmvpFFJK/t8/DpCfFs+XVk7yLI+12/ifGxcRK3qwNZfjzpgc4ihA%0A0XL1XLUt5GbPf1rB+IxEzp1q3JGbGUMhXEPmpDJbbxd0NIX9TAAVDe1kJceREt+XmT9/fDqtDYYy%0A9u8zBH0WwekYI9BoNCOAmTqa5mcRRJLRE6rhnB9byhqYWZBKb9Y06mU6Lbv/ZukUb+49xc6KJr59%0AyQwS43yDtROyknj00izs9PK3qqTQBxq/GBAh4wQVDS42Hq3nuqUTVJAYlEWQnNt/3oIXvpPKrFlV%0AFQ0ujzVgMq8onRS3oUgSA7RW0zECjUYzpJhdR70tAnen5TtawLJrqNvdy/byJs6alMWd8xP5xLYI%0AcfQ9XB0dIffr6unl528dZEZ+Kl9YUhRwm/OyVO79nw7ZWbs/xEU4PlW1iAgRJ3hhSwU2Adcu9TpX%0Azf6ArSW8SU2IpS3OUIYWraoTXqmjJvOL0smkla7YNLAHqOHVMQKNRjOkeCwCQxGYGTeRxAnMRm5h%0ALIJ91S20d7tZNimL9HjBlHOuIRUnf3rxpZD7/WVzOeUOF9+5fCZ2W5BAcINKHY0vmM79L+2iqqk9%0A+AHNTqS9vf1W9bh7eWlbBedPz6Uww2gsJ92q9iBEoNhEpFj//nrcvVQ1tTMxy7eB3dTcFHJtrbTa%0AgrT58KSPakWg0WgioMHZRbe7/4WPlmo1ED3BuOikGK28IokTOGtV2mlMfMjNthjxgeUlqpHarJVX%0A4hZ25OG3+euu6oD7tHR086v3j/KZKdmsmh6iTsFxFBIy+OnNq3D3Su57bkfgzwuqE2lHc18Bmhfr%0AD9VR09LJDcsnepYltteorKgwFgFAfOY49cLC93eyuQN3r2RCpq9FEGO3URTfTl1vSuAdY+JUh1Vt%0AEWg0Gqt09ri5+JEPuPqxj/oPSG+uVPEBM+VyQBaBtWKyT8saKM5OIi/NaPqWkI4o/gyXJ+zhe6/s%0AodzR3+f9mw+O0eDs4rufnYUIlhYKqoYgeyolOcn8v8/PY1t5I4+sPRx42xATy57fcoKclHhVSWyQ%0A7DRy+kNkDJlkZmXjIsHS91fhlzrqTb69jaquJNy9QWI1IzyTQCsCjWaMsa28kQZnF/uqW7jqfz9i%0AX7XX8BOzmMxkIBZBW13Y+ICUkq1lDSwt9m2rbJt+KcU9ZRSKeu59bgddPX138Seb23lqQylXLSxk%0AXlF6aBkcfQPrr1xQyI3LJ/L4+mOsOxigNVn2NGXBVPpWGJ9q7uD9g7Vcu7SIWHvfpU4pAgG5s0LL%0AgJpUVtObTk9zYAvHG//UUW/SZAt17hTfFhrexI1sK2qtCDSaMcaGI/XE2AQvfuVshIBrn9jEeweM%0AO1bvYjJQwdTYpAFYBKHjA8fq2mh0dbN8kl8WzLRLAfjlohp2Vzbzn2/39QF65J3DSAn3r54R+vzd%0A7arhXHZfj6EHr5jN7HFp3Pf8Do75X0xtNhi/tF/m0JptFfRKuH6p71CblLYyyJqkRm6GoTAjkVoy%0A6W4+GXbbEw0uYmyCcel+bbGlJKG7iQbSfGcYexOf0i99VErJlrKGvsE2UUQrAo1mjLHhSB2LizNZ%0AVpLF6/esZEpuCnf9aSu/33AU2XrS1yIQQlkFkcYIwtQQfFqqCriWlfgNWsmZBpklzGn7hFvPLubJ%0ADaW8f7CGg6daWLO9kts+UxzwjtmHhuPq2UsRJMTa+e2tS4iz27jrT1tp6ej23adomWoZYVxMe3sl%0AL2yt4OzJ2ZTk+I6GTHaWW4oPgKourpPWhthXNLZTmJFIjN3vstrZgujtps2Wxp6qIIogLrmfRbDp%0AmINrn9hF2VPSAAAgAElEQVTEllMDGCwUIVoRaDQWWX+otu/Oe4RwtHWyt6rFUxiVl5bAC19ZwSWz%0A83n8758gpBt36jjfnVILrFsE7m5VpRsmRrClrIGclDgm+V1kEUJZBaUf8L1LiplZkMq3X9zFv7+2%0Al9T4GO65IEylMPQ1m/PrOlqUmcRjNy/mhMPFN57b4etvL1qmCumqtgPw8TEHFQ3t3LDcb8RldzuJ%0A7acsZQyB6jdUKzOIcYXvlh8odRTwZGElZRawuzJIGm+AGMHTH5WSnRzHwrzATfGGEq0INBqL/Ofb%0Ah7jvuR042jpHTIaPjqn5ued6ZdwkxcXw+M1L+PoS5ZL47y0u3zvmlHxoDe/aALxSR0Mrgk9LG1hW%0AkhU44Dt9NfR0kFD5Mf9702I6unvZUtbIv1w4lYykuPAyeAbW928/fdbkbB66cg7rDtXx8DuH+lYU%0ALVHPRsD4uS0nSE+M5dI5fg3r6g4i6LVsEeSnx1MrM4h1u8KmdwYqJgM8bTeyc8exr7qFnkDZT34W%0AQWm9k/cO1nLzimLi7CGC6kOEVgQajQWklJTWO3F2uXnig2MjJseGw3WkJ8Z6ulqa2GyC2+fEArCu%0AOoZrHv/Yk8VCaoH14TThhtYD1U3tVDW193cLmRSfo+ISh99mal4Kv7x+AZfOyefWs0usyeA4ppSX%0A2YPHj1tWFHPTWSp4/PpOo4AuMRNypkPlVhqcXbyz7xRXLxpPQqzf3XSIYTSBiI+x0x5vfBchvsO2%0Azh4anF2BLQKXUq6F44vo7OnlSG0AhRKf6qNo/vBRKbE2G7esmNh/2yigFYHmzODkLlVINEBqWztx%0AdblJTYjhj5vKOdUcuno2Gkgp2XCknpVTswMXYhnFZN+/6RJONndw9WMf8fHRenVR7WqzVrDkNFwg%0AIWIEW8qM+oFJQRRBbAJMXqW6kUrJZXPH8ZsvLu1/UQ5Gw7GwzeYeumIOy0uy+Nc1u/smgBUtg4rN%0AbH/rj1woN3NXzl7Y/1ffx+E3cdviVAtri0gz8ypEnKDCkzGU2H+lS1lxkyaqi/qeQAHjuGSPa6i5%0AvZuXtlVyxYJC8lIjm8c8ULQi0Jz+NJTCb84jr/ajAR+itF6Z7d+7fBZSSv7n/SNDJZ1ljtW1caql%0Ag3OnBblbb6mCmATOmj2VV7++krTEWG7+3WbeKjd86VasAguuoU9LG0iJj2FmQeA7dkDNKGiugNoD%0A4c/pj+No2At1XIyNx25ZTHZyHHc/s5W61k4oORfaG7h4z/38Ju5Rxr9zN7z4Rd/Hgb/Rmjo16DCa%0AQMSkhS8q828/7YPxnRaNn0BqfAy7qwLECbxiBC9sOYGry82XzimxLONgierweo1mVGAEH5NcVQM+%0AhKkIzp2Ww/XLJvD8pxV85bwpTMwOn4I4VHx4WF1QzpkaJLXTrCEQgql5Kbxx7zk8+Po+nt2xm8vi%0AoP5kOTkB/O4+ePoMBVcEW8oaWDQxo392jDfmsJojb1sq3PLQ0azcU+HaTwM5KfH89talXPPEx3z9%0A2W08++XrOMwU7n9hG9+8eCqXzRkXcL89e09wrnWJSMweD5UgW08RzFsfqpgMlwPs8djiU5gzPo09%0AVS39t4lLAXcXPV0d/PHjclZMzmJOYZhaiyFEWwSa058mVUma0BE+8yMYZfVO4mJsFKYncu+F07Db%0ABI++G6TSNUpsOFLHpJzk4OmXfjUESXEx/Oe1C7jj0hUA/GLNB7yzL0wapLMO7PFB/fONzi4O17R5%0A2koEJX08FMyDwxEOqwk3ntKPuePT+cU1C9hS1siDf9vP7w/HUxE7iXPPuQAK5gZ8uGMiU96Z2Xl0%0Ayhi6GoMH3CsaXKQmxJCeGNt/pcuh6jKEYH5RBgdOtvgU2gGeucXv7y6lqqndpzX3cKAVgeb0p+kE%0AMDhFcLzeSUl2EjabID8tgds+U8KrO6s4UjM8PeQ7e9x8cryBc6eFKPQyZxX7ceHS+QBMTXJy9zPb%0AeOiv+/rmAvtjjqgM0v5hW7lRPxAsPuDNtEuhYnPYoTE+BKghCMeVCwr52qopPPfpCV7dUcmVCwtJ%0Ajh86Z8e4jCTqyKC9MXh18YkGFxMykwJnUbkckKS+r3nj0+nq6eWw/+/G6ED68ieHKM5O4qJZ+UMm%0AvxW0ItCc/ngsgoHXAJTVOynJ7suZ/+r5U0iOiwne/2aI2V7eRHu3O3h8oLcXWk4GVAQkZYEtli8t%0ASOTL50ziDx+X8fnHPu5foQth+wxtKWsg1i5YOCFIJ01vpl+qAvRH3wu/rYnjKCAgM7I74vtXz+DC%0AmXn0Srhh2dBm2hRmJFAnM3C3BLcIgtYQgIoRGANp5hutNfoVlhkdSI9X1XD7Z0qCd2WNEloRaE5/%0ADIsgvtOhCqYixN0rKXe4mJTbpwiykuP40jmTeHPvqcBZIEPMhiN1xNgEKyYHuRN31kFvt297CRMh%0AILWAGGct//5Ps/ndbUs52dzOFf+zsS/90qStNnSguKyB+UUZ1jKAxi9RoxkjmWXsOAbpE1TmUQTY%0AbYLHbl7Ma/esZIEVJRUBhUZRmd0Z+Eait1dS0dgePF7kcnhGVE7MSiItIaZ/qwlDEeTGdXOtX0uM%0A4UArAs3pT9MJiElQhUQtkQeMq5va6XL3Minbt4r2znMnkZEU61vYFCU2Hq1n0cQMUhMC+KDBayBN%0AAEUAPm0mLpqVz5vfOI85hWl8+8VdlNV7tTZw1ged2tXe5WZPZXPw+gF/bHaYejEcWQu9FlN3HUcj%0Acgt5kxBrt2apREhuSjz1ZBDfURdwfV1bJ109vcFjN2aMABBGnGCPX+ZQfbf6u35uRorPmMvhImqK%0AQAgxQQixTgixXwixTwjxDWP5Q0KIKiHETuNxebRk0Gjocqm75QlnqfeGdRAJZsaQfzuFtIRYvnr+%0AFD44XOfJrY8GDc4u9lQ1B3cLQf+BNP74tZkoSE/g1zctJsYu+KUZ9JYypGtoR0UjPb2yf6O5UExb%0ADe0NIUdJepDSqCEYmCKIFjabwBWfS2JPC/T0ryr3dB3NDFBD0NMFnS0+Q+vnFaVz6FSrT5zmbwdV%0AJtGl00Kk5EaRaFoEPcC3pZSzgRXAPUIIM4/sl1LKhcbjH1GUQXOm01yhnicZCYMDUARljsCKAOC2%0As0vITY3nP986FLUukR8drUdKwgSK/WYV+xOg8VxeWgJ3rJzEX3dVc+Bkiwrq9nYHbUG9pbQRIWBJ%0AsUWLAGDqRSDsKo00HC6HSh+1mDE0nLjN7yRALcYJR5jUUfBRBPPHp9Ptlhw6pQLG7V1uXt6jLISc%0AuMhdl0NB1GwQKeVJ4KTxulUIcQAI8ivVWKKzFco/VkE4jTXMC//Es5HYEANQBMfrnCTH2cl1HoJ9%0AvkVpicBjkxt4c+8pjv1tA1Nzg0yh8iYpC+ZfHzQzx5+NR+pJS4hhflEIt0dLlZpy5XXB8SG1QF3o%0Aezp9Jo999bwp/PmTch5++xC/+5yRtx7EIthS1sCM/NTAKZLBSMxU1tjhd+CiH4TeNkizudGALbUA%0AmlFWVYZvMLqi0YUQMD6QRRBAEZizGHZXNbNgQgYvb6+kpiMGEhix4TSWFIEQ4hXgd8CbUsog8+JC%0A7l8CLAI2AyuBe4UQtwJbUVZDv/wyIcTdwN0A+fn5rF+/PtLTAtDW1jbgfUeCUPIWVr3J9CNPsGXp%0Af+NMKRlWuQIxFr7bwqq1TAc+PlTLorhMmg9u5qBYH9Exth/uICdB0vLcXaS39I8HLAOWxQLbIzhm%0AWSMt6cEHo5jfrZSStXvbmZ5uY8OHHwTdftaRHaTFZrL5ww8Drh9X3cQMYNO7r9OZ4HvHv3qC4OWD%0AtfzNvokrgJ3HqmlqWO+zjbtXsqXUxcrxMQH/5qF+C+Pj5jDtxJMc+fO3qSq6IuhnKDj5HjOBzUcd%0AtFcHPtZQMJDfbUOHcp7s/uRdGo75tovesr+TzHjBpo0b+u2X0biLhcCOI5U016lzSilJjYV3thxk%0AfPtxfr2xnZzUJOiGYwd2U+H0lW1Y/s+klGEfwMXAs8Ax4GfADCv7GfumANuAzxvv8wE7yi31E+Dp%0AcMdYsmSJHCjr1q0b8L4jQUh53/+JlA+mSfnBfw6bPKEYE9/t29+X8ke5UrrdsvGRFVL+7rKID3He%0AL96X9/x5q5Q/nSDl6/8ipaux3+OVj/fKeQ+8INduOxRwvefRVCHlQ5lSrn0o5DnN7/ZITassfuAN%0A+ewn5aGFfPqzoT/bobfUb+fEp/1WOTu75ZL/WCv/65c/U9uc2ttvm50nGmXxA2/I13dWhZQ3IO4e%0AKZ+7ScoH06Xc91rw7dY+JOUPs6Ts6Q6+zRAwkN/tS+u2SPlgmmz+4LF+6655/CN57RMfB95xzxr1%0Andbs91l86+82y0t/+YF8/2CNLH7gDfnqtgr1/bz34yGR1wTYKi1cpy3FCKSU70opbwYWA2XAu0KI%0Aj4UQdwghgtqJxrqXgWellK8Yx6qRUrqlsiyeBJZHqLvOXMw+MJGk453pNJ2AjAlgs9GRkBdxjKCr%0Ap5fKxnZmZ/Qo/3XuLEjM6Pe4YvkscnLz+Pn6k7jj0wNuQ2IGpBfBxLMt/w03HlGZKiHjA6BcQ+kh%0APK8hRlYmxcVw74VTaaozMo8CxAg8jeasZgx5Y7PDF56CoqXwyt1wYnPg7RqOQWYJ2Edf55vM3ELc%0AUuB09M86C1lDYLSgNusITOYXpXOkto3H1x0jLzWey+cXjujcYsvBYiFENnA7cCewA/hvlGJYG2R7%0AgXInHZBSPuK13LsByNXA3oilPlMx/Y2VW/p+YKOYB1/fy1t7I5iMFQ2aTkBGMQAdCfnQWq0yOSxS%0A0ejC3SuZHW9UJQcJZMbYbXzrkukcqW3j1R1hUlSnr4aavWrQfBg2HKmnJDsp9FQvKfvPKvbHM8Q+%0A8N/jxuUTmZTowo2N3oT+WUFbyhqYkJVIgf8YRqvEJsKNzysZn7se6gM07XOE7zo6UhRmpeAgne4m%0A36Kyjm43NS2doYvJQMVKvJg7Ph13r+TTsgZu+0wJcTE21WZiNCsCIcSrwAYgCbhCSnmllPIFKeW9%0AKNdPIFYCXwQu9EsV/YUQYo8QYjdwAfB/Bv8xzhBcDohPU5OYjr470tKEpLdX8uzmE7wW7qIYbZpO%0AeIJ7HQl56ruLoJbAzLEvEcYFNERq4+Vzx7FwQgY/e/MAza4Q2R/GXN9wVkFXTy+bjjtCp42C+l24%0Au4JnDIEKAAtb0A6kcTE2zi+EBpnCm/t98+WllGwta7RePxBUhhy45WWVRfTnL/Q1uANVGd1wfFQG%0AigEK01VRmb9FVdnYDgTJGAL1t0nM7GflmBXG8TE2blxuBJ/jkq21Co8CVi2CX0kpZ0spfypVNpAH%0AKeXSQDtIKTdKKYWUcr70ShWVUn5RSjnPWH6l//E0IXA5oOQc9U992EI63gjS6Oqip1dyNFAbg+Gi%0As00NBfFWBBCRe8isISjorgRbTL+MEW9sNsGP/3kuDc6u0EVmuTPUccI0ZNtxohFXl9uaWwhCWwQ2%0Au/rdhOipPynRRZs9k/9ae8hnitaxOicOZ9fA3EL+ZE2Gm15USuAv1/VN5Wo9Cd2uUVdDYJKWGIND%0AZBLrN7Iy5BwCUL+/AJlcBWkJTM5J5sblE8lKNqa2xaX0m1s8XFhVBLOFEJ7cNSFEphDi61GSSRMM%0AZ736Z556ibII3D0jLVFQaltV4U1ZvZPuQKP5hgOzhsCjCAw/udF7yAql9U4ykmJJaClTLiZ76NTJ%0AuePTufXsEv68uTz4fFqvub50Bx9ws+FIPXabYMWUICmhJuGKyUxS8kPOJBDOOlJzCjle5+SV7X1W%0AkxkfsNRozgpFS+Cap9WwoDVfUr/jBrPr6OhUBEIInHE5JHXV+yz3FJOFsggCKAIhBG9+81z+/Z+8%0AWnSPgRjBXVJKz69aqnTPu6IjkiYgUvb9qKavho4mz3zW0UidoQh6jD49I4J552/ECDrjs5V7JEKL%0AYFJOckT+62+tnk5OSjz/9upe3wHr3ky/VN0Bl20MepwNR+pYNCGDtGBtJUzMWEMo1xAY1cUhYjbO%0AOrLzxrNgQgaPvnvYU/m6pbSB7OQ4JgcoqBswMy+Hz/4CDr8Fb/5frznFozNGANCVmEuKu8mnXUZF%0Ag4uEWBu5KfGBd3I6+gWKTeJj7L7N5UZ7jACwC6/+qkIIO2BhCrVmyOhoUp0ck3NgyoXKTWGlWnOE%0AMC0CgKOBZrQOBx5FoCwCaYtRF8sIFEFZvZNJWUkRtT5IS4jl+5+bxZ6qZv6yOYj1UXIOxCQG/Ru2%0AdUl2h2srYdJSrX4PQSqCPYSxCHDWIVLy+NdLZ1Dd3MGzm9X39GlZA0tLMgO3WB4My++Cld+ErU/D%0AhkcgJgFSw1g1I0lKAXZ6++Y605cxFPS78WpBHZYxECN4C3hBCHGREOIi4DljmWa48KShZUNCukpB%0AHMVxgjovRRCw3fFw0FSuLi7e83czii0rgvYuN9XNHcxNc0Xsv75yQSErp2bzi7cPUdsawP0Tm6jm%0A+h5+S1l7fuxvcCMlnBMuPgBKEaQWgi3Mv3NqgdGlNEADuC6XuhtNzmHl1BxWTs3m1+uOcrS2lcrG%0AEIPqB8tFD8Lca5QbL2tK+M8wgpgjK7ua+uYSmHMIAmJa8ckW/oYwJmIEDwDrgK8Zj/eAf42WUJoA%0AmGloppk5bTXU7oemipGTKQS1rR0kx9kZl57AsZG0CDIm+rZyyJhoWRGUN6h/yplxxh1gBG4LIQQ/%0Aumound29/PQfBwNvNH21kqWuf2B5b72b1IQYFhRZGFcYZCBNP1LyVdaUM0AXTXOZYVX830tn0uDs%0A4hvP7wRCDKofLDYb/PNjMONymHFZdM4xRCRkq++4qUb9z0kpqWhwBY8PdLao3k3B2n74M9pjBFLK%0AXinl41LKa4zHb6SUFvvKaoYET88S4x/S7Dc0St1Dda2d5KUlMDUvZeQyhxrL+2f5ZExUd9AWaglK%0A65QiKMa4A4wwtXFKbgpfOX8yr+6oYtMxR/8NvOf6eiGlZF+9m5VTckLPBTYJV0NgEqqWwG9o/cIJ%0AGVw6J5991S0kx9mZPS4t/PEHSkw83Phc+F5EI0xGrpoT0FqvAumNrm6cXe7wNQRBYgT9iDcsgt7h%0AT66wWkcwTQixxmgpfdx8RFs4jRemIjDNzJzpys0R6UzYYaK2tZPclHim5KZwrLYtap05Q+JVQ+Ah%0AYyIg+zKKQlBqdB3N7a5SLqZwwdgA3HPBVCZkJfLvr+/tP6c2vQjy5/b7G5bWO3F0SGtuISvFZCYp%0AhiIIFCdwGmmRXrMIvr16BkLA4uJMawrpNCeroAiAjkYVnD8RamA9+LpzrRCXDEjlhhxmrP51fw88%0AjmotfQHwJ+DP0RJKEwCXeXdh/KiEgOmXQemH0N0+cnIFob61k9y0eKbkpeDscnOyOXiaZFTobFV9%0A8AMqAiy5h0rrnOSmxhPXdFzlvw/Af50Qa+dHV87laG0bT20McO80bTWc2ATtfammHx5WbprzrASK%0A2xuhp92akko10mcDWgS+riGA6fmpPHLdAr51yfTwxz4DKMzOoFGm0Nuivr/wqaOmlRWBawhGJE5g%0A9ZedKKV8DxBSynIp5UPA56InlqYfLofKMonzSuGbvlpdBEr7dz0caUyLwGzLPOyZQ2bsxEgd9ZBp%0AvLegCMocZurowKdmAVwwM49L5+Tzq/eOUNnod7dnzPWVx97n46P1fO3P2/iPvx+gMEUEH33ojZVi%0AMhNPv6EAFoFZ5evXgvrqRUUsmhjBIJrTmIRYOw6Rid2wnsIXk/VvQR0SjyIYfleqVUXQKYSwAUeE%0AEP8ihLia4K0lNNHAGaAwpfgciE0adXECV1cPbZ095KXFMzVP/UyGPXPIr4bAQ2qhanFgxSKodzI5%0AKwEaSgfd+uDBK+ZgE4KH/rrfZ3lz9kI6YtN557U/cdNTm9l03MGd50zi20ss9vQxi8nSi8JvGxOv%0A2h0EixHEp0U8K/hMozUm2zOysqLBRU5KPElxQZrkDSRGACOiCKy2+fsGqs/QfcB/oNxDt0VLKE0A%0AXI7+JmZsgpGC+A5cLi0POok2ZupoXmoCOSlxpCfGjoBF4FtD4MEeo7p0hlEELR3d1Ld1MS/VyPwY%0AZKFTYUYi37hoGj998yBr99dQmJHAnz8p57Ud1fyUOayK2c5/XTOPzy0YT0KsnfXrQ+T7+wgagUUA%0AKk4QLEZgNc3xDKYjIZfxzh2AkToazBoA9T9rj/e14kNhbjcCtQRhFYFRPHa9lPJ+oA24I+pSafoT%0ApGcJ01bDoX9A3UHICz7oZDgxi8lyU+MRQqjMoWFXBOXKlRbo4pZRHLbNhNlsbkaM2XV08K0PvnTO%0AJF7eXsk9f9lOV08vCbE2rlownsU5N5Cx/j6+UFALscF7GQWkpVpZOKbbJxyp+cFjBOEK0jS4k/LI%0AbG0EKTnR4GJJcQi3mVlDYPUGLc6YVzwaYwRGmug5wyCLJhSuIKXqZgriKCou67MIVNn9lNzkEXAN%0AlfevITCxUEtgNpubaKaODkHrg1i7jZ9/YT4LizL493+azebvXszPr5nPxOVXqNYXA/kbtlSrtFCb%0A3dr2wSyCtjptEVjAllZArHDTVH+Kk80dwTOGILKqYuizCLpaByfkALAaI9ghhPirEOKLQojPm4+o%0ASjaWqNkPG/4ruucIFCMA5ebInzeqhtXUtqgMoVxDEUzNS6G+rYsml/U5AIMmUOqoScZE1e2ypzPw%0AepQiEAKyOirVnVqQOb6RsmhiJi9+9Wy+fM4k0pOMHkJJWVC0fGCxnuZK624hUBZBW03/amZnrW8F%0AtiYgcZnquz5w5AjuXhl6ToSz3np8ALxiBKPQIjBIABzAhcAVxuOfoiXUmGP7H+G9H0XPt9fTqe4S%0AgqWhTV8NJz5RqYSjgLq2Tuw2QVaSakdlBoyj5R7q7ZX96xSaTvRlCPljBpBDDIYpq3dSmJ5ITKPR%0AYyja8Zfpl6punKGawgXCag2BSUqBml3g/Vtx96ic9yFSdqczqTkqTbe0VHVLDdpeAoJ2Hg3KCMYI%0ArFYW3xHg8aVoCzdmcBgtdAOV7g8F4dLQpqkURI69H53zR0htSyc5KXHYjM6KU3OV7zMa7qGWjm4u%0A/9UG/uONA30LO1rUhS6URQAh4wSerqMRNJsbFNOtDavxwVNMZiFjyCRQLYHLAUitCCyQkad+OzXV%0A6rcTMsU3kj5DMPrrCIQQvxdCPO3/iLZwYwazhW7UFUGQH1XRUkjMGjVVxnVtneSl9qUhjs9MJD7G%0ANuQWgbtXct9zOzh4qpWPjnr1ifebQ9CPMEVlUkpK651MzYpT2wxHa+S82eqCHkmcoKMZup2RWwTg%0AO2nLU0ymFUE4sgtUm4nu5pPE2gUFaUHSbXu6VK+hSCwCe6zKMhqBGIHV9NE3vF4noGYNVwfZ9syi%0Ap6vvghItReD0qyr2x2aHqRfD0bWqs6TVwGGUqG3p9Jlta7cJJucOfebQL946yPpDdUzNS+FYXRud%0APW7iY+yqxxAEVwSp41Tb5sbAFkGDs4uWjh7mJjeqJm3DoQiEUC6+XS+EjF34YHUgjTeefkNeAWNP%0AewkdIwhHTGIaThLIpZGizCTfeQLeRFpMZhI/Mh1IrbqGXvZ6PAtcBwQcUXnG0VSu3DLgO4N1KPHv%0AMxSI6Zeq7aq2RUeGCFAWge+gjim5yUPafO7VHZX85sPj3LJiIvddNE2NxTQVTbBiMhN76LkEZUaP%0AoekxxsVyuOboTrtU3eGXf2Rte48iiKAHkplm2uo1Idav4ZwmNE32bHJFE0WZYWoIIHJFMEIzCQba%0ASWoaoG8foC8+AH3/UEONlR/VlAsHnoI4hLh7JY62Tk/GkMnUvBQqG9s9U68Gw86KJh54eQ8rJmfx%0A4BVzmD1OxSAOnjRM6qYTEJsc+vvKDD6X4LjRdXS82yjWyp48aJktMek81dzOqosv0mIyUHeccam+%0AKaRB2ktoAuOKyyFPNIVJHTWVa4QpuXGpo7fFhBCiVQjRYj6Av6FmFGjM+IAtts/EHmpcDkCo9gDB%0ASMqCCWeNeLsJh7OTXkk/i2BqXgpS9l1kB0pNSwdfeWYreanxPHbzEmLtNkqyk4mLsXHgZIvaKFQN%0AgUmIWoIyh5MYmyCj44RSJqG+96EkLglKzrX+N2ypAkSfu8cq/kVlzjqwx6mBR5qwdCfmkUc4RTAI%0Ai2C0KgIpZaqUMs3rMV1K+XK0hRsTNBxTF4rM4ujGCBIzw/v+p62GU3v6XAYjQG1LX1WxN54U0kG4%0Ahzq63dz9zDZaO3p48talZCWr9NQYu40Z+akcPOVlEQSLD5hkFKuAaYDh8WX1atiIveH48M/QnX4p%0ANBwn0VUVftuWKuXqsYeZaeyPf1GZs05ZA6OkRcloR6bkK4sglGvIGSbBIxijOUYghLhaCJHu9T5D%0ACPHP0RNrDOE4qi4WyXmqOjMaWM1Hnm5MeBrB4rK6NlMR+GZTTMpJxiYGXksgpeR7r+xhV0UTj1y3%0AkFl+g1JmFqRy4GSLqidoCjCQxh9zfYBaguPeA+uHKz5gYlSKZzu2ht+2pVoVFEZKIItAVxVbJiWn%0AiCTRyYzsEIrTtAgitSZHKEZgNWvoQSnlq+YbKWWTEOJB4LVgOwghJqDmFuQDEvitlPK/hRBZwAtA%0ACVAGXCelHB2VUAPBcVwNIu92BRw5OCRYzUfOmwXpE+DIWlhye3RkCUNdi297CZP4GDsTs5IGPLby%0AqQ2lvLKjiv9z8XQum9vfFTJrXBovbaukvr6W3I5m64qgqQxy+u76pZSU1Ts5rzgJyqqHp4bAm8xi%0AyJ1Jbt1H4TPAWqohZ1rk5zAtAmk0Kmyr1X2GIqC4eDJsh8l/vRZi4gJv1FShlIDd6iXWYDTHCIJs%0AF+4T9gDfllLOBlYA9wghZgPfAd6TUk5DzT7+jlVhRx1dLmipVBZBSl50YwRWLAIhYPwS1YBuhOiz%0ACOL7rZsywBTS9Ydq+embB/js3ALuvTCwq8a0EMqPG8rYsiLwjRPUtHTS3u1mXpJxRzfcigDgrK+Q%0A3hvG3ckAACAASURBVHII3vl+6O1aqgc0NY3UfHXj0mm40pz1OnU0EiadB7OuUN9jYmbgx7j5cPY9%0AkR97hGIEVtXVViHEI8Cvjff3ACHzFKWUJ4GTxutWIcQBYDxwFbDK2OyPwHrGauC5wZg4lT1ZpZC2%0AN4K7O3KfbTic9VC0zNq2qQVw9L2hPX8E1LZ0kJoQQ0Js/zvZqXkpbDhST4+71/Low+N1bdz73A6m%0A56fy8LULPNXK/swyMofqK46oBcHaS5ikjlMBfj9FYDabm2IzXCfDHSMAWPolKne8T9Enj6k5A4Eu%0AKB0tqmApkowhE++RlfGpugV1pKSNg+ujNKAxPmVUu4buBf4d5dKRwFqUMrCEEKIEWARsBvINJQFw%0ACuU6CrTP3cDdAPn5+axfv97q6Xxoa2sb8L7hyKn7mLnA1tJm0loamQ58/O5f6YqPMFPAi37ySsn5%0AznpO1DsptfA5JtY6mdzVyofvvUWvPbpDRgJ9t/uOd5Bi7w34nfc0dNPl7uXlt9aTn2xNEfx2dyc9%0APT18eXoPWzZtDLltVoKg5tguADbuq6DncHNIec+Ky6bl8FYOxPQtW1/RDYC77BMAPtxXRe/BAIPn%0Ao0xbwXXEd9aT8/a/sb+iibq8lT7rk5wnWA7sr2ymNsLfd0bjKRYCOze8RWvqZM51d3H0VCuVg/g/%0Aieb/2VAzmmWdWFXL5N5uPnh/LdKmbiiHRV4pZVQfqElm24DPG++b/NY3hjvGkiVL5EBZt27dgPcN%0Ay4cPS/lgmpQdLVLue129rt45qEP2k9fVqI778f9aO8COZ9X29UcHLMOp5nb5wJpd0tXZE3K7QN/t%0AFx77SN7wm00Bt99W3iCLH3hDrt13ypIcHd09cu4P3pL3v2jtO7396c3y5Z/cIuVPCqXs7Q0v7x+u%0AkPLJi3wW/eTv++X0f/uH7H3lK1I+PNPSeaPBunXrpOxySfnUJVL+KFfKso99Nzjyrvo7l30U+cFr%0ADqh9d78kZd0R9Xrn84OXd4wwqmXd9Lj6ezgdnkWDkRfYKi1cp61mDa0VQmR4vc8UQoRNdhZCxAIv%0AA89KKV8xFtcIIcYZ68cBUXKsDwOO48rMjk/t87EOdQpppPnIoebSWuTVHVU8v6WCHScij+HXtvYv%0AJjOJNIV0w+F6Wjt7+Nz8cZa2nzUujdSOanrTJ1hLhQxQS3C8zklJdjKi4fjIxAe8iU2EG5+HjAnw%0A/I1Qd7hv3UCqik28G8+Zv9cUXUw2KvDMJBhe95DVYHGOlLLJfCNVlk/I6JIQQgC/Aw5IKR/xWvVX%0A+sZc3ga8bl3cUYb3UHOzKnOoU0jDNZzzx9NLJsJ2xl5sOqbOWeqILJ9ZSklda//2EiZpCbHkpcZb%0ADhj/Y89J0hNjWTnV2mefOS6N8dThTLToN88oVgqzu92zqMzhpCQnadAD64eMpCy4eY3qjfTsF/p6%0ABJmKINWakvQhIUM1N2s71ZfgoKuKRwfmTIJhjhNYVQS9QghPGobh85dBt1asBL4IXCiE2Gk8Lgd+%0ABlwihDgCXGy8H5t4tyg2/5GG2iLwNJyzNumoPd5USAOzCLrdvWwpawCg3OGKaN+2zh7au91BLQKw%0AnjnU2eNm7f4aVs/OJ9ZiYHn2uFSKRB01douVtmZAuUl1K3X3Sk44XMzMcCsFPBKB4kBkTYKbXlC/%0Ahb9cpy4SLVUq5TNY+mIohDBqCWq8Oo/qrKFRwQi1orYaLP43YKMQ4gNAAOdiBHKDIaXcaGwbiIss%0ASzha6WhW/0TmxSI+VfWJGeoUUisN57z4+iulPEkMMQO0CHZXNuPqUv2AzAwaq3hGVKYFVwRT81J4%0AbWcVUkpECPeN6Ra63KJbCKAkuYcY4WJzdzaWLuHeKaS506luaqfL3cvcBEP5DncxWSjGL4Frfq9c%0ARGvuUNlpA8kYMkkpUBZBWx0gIm+FoIkOHkUwvK2orbaYeAvVbfQQ8BzwbaA95E6nO2azOfNiIYSy%0ACoa68ZwrTAtqLyoaXKw7XE+NTKejcWBtJj45rhTP0uJMzwB3q3iG1qcEz1aampdCa0ePR2kE4x97%0ATpKWEMPKKdbTGmNa1J39vvaMMFsa+A2oOW583kliBFNHQzHjMvjcf6nK8ePrBhYfMPG2CJKyIi98%0A0kSHERpXaTVYfCeq+OvbwP3AM8BD0RNrDGAqAu+LRXLu0LeidjkgJrEviBSCNdtUu4Q6mUFzXfAx%0AjKH45LiDGfmpLC7OpLzBRW9vOA9gH1YtAjBaTTRXwRPnQplvWqjpFrp0TgFxMRE0yDUCv1saU/qP%0ArgxESoFPLYGp+Mb1VKlOrpkl1s89XCz9EpzzLfV6KCwCZ62OD4wmRmhcpdX/sm8Ay4ByKeUFqJqA%0AptC7nOY4jgLC92KRnBuFGIG1quLeXsmabZWcMzWHJnsW7paTYffxp6unl61ljZw9JZuS7GS6eno5%0A2dK/KVsw+iwCC4qgrk0pgFO74fmboLavGnrjkcjdQoDnzn6vK8NT4RwSm01l5BiKoLTeSXKcnaS2%0AMmUtDMT/Phxc9AO49Kew9I6BHyM1X7k3myq0IhhNxKnCyNGaNdQhpewAEELESykPAjOiJ9YYoOGY%0AuojEerlBUqKgCFyO4EPrvdh03EFVUzvXLZtAbPo4EjvrcUdwNw+wq7KJ9m43KyZnq8wZiMg9VNfa%0ASaxdkJEUvLI6LzWelPgYZRHU7lN35DEJ8Ow1YCivv++O3C0EQNMJemJTaCaZAyct+lgzJnoUSGm9%0Ak0m5yYiRaDYXCULA2V+H/DkDP4ZZXVx3UCuC0cQoTx+tNOoIXgPWCiFeB4JP/j4TcBztf7EwLQIr%0AbgmruOotWQQvba0gLSGG1bPzySqYSCat7D0RmVLadMyBELBichYl2eoHGUnAuLa1g9yU+JBBYCEE%0AU/KMzKGa/ZA7A256EVwN8Jdr6XQ2qWyhSN1CAE0nEBnFgOCgOZsgHBnFPhZBSVaScvuNtvjAUGOm%0AGfd06D5Do4nYROWWHI0xAinl1VLKJinlQ6hWE78Dztw21FKqYjL/i0VyHvT2qJ5DQ4XLEbaGoLm9%0Amzf3nuKqheNJiLUzcaKaqLVtb2TdUDcdczCrII2MpDgK0hKIj7FRHkEtQV1rJ7nBhnl7MTVXzRim%0Adr8a2l64EK77I9Tsp+2ZW2j//+2deXTc1ZXnP6+0q0qLrR1LtmS8YGOwsQwx2GAD2WDSCRA7gYSE%0ASUgzJ53pyTJ90uSkZ8hkJmfodKYz6elAloYO6UCIDQkJCQkdwDZgbLAN3jDe5E3yJqlkbZZUUqne%0A/PF+VSpJtfyqVKt1P+fUqdKvfvV7V3K5bt137/1ez5DtJrJxdJ8iZ+Yc6soKx4bURKN8NlzsYHiw%0An7YLAywp85iKjUzoIUgmriBlF9EZyhyUMpVDGZojCKC13qK1/p3WejgZBmUFFzvB0zP5wyLQS5DA%0AyiEbOYLf7z2Dx+tj/Yp6Y0aFqSY51HI40svGMTQyytunTH4AwOFQNFY4Od5pv5ego88TMT/gZ161%0Ai8Fet6mFr1lsDs7/AHzk+1Sce41/KHycVXNjLGfUOjCQ5oraoCE10bDmGp89eQSfhkUFVhR1qTuC%0A4Klm0kOQWeS7MnZrSAimK0TFEIy16Seql8BrfTuNkiPYsLONK2pLuGqWNTvIkhDobm+le8Cev97d%0A2o3H62Nl0AfwnIriwCB3O3T0eSJWDPmZV+1iobKqmmqWBI57lt7Lj/THuZPN5G/9B9vrAiYK8/RC%0A+WwW1ZVytL0fj9fGfGSrhLTrtFEtbVRWkj2TcwSJoLgSlKUQKzmCzCINUtTiCOLBP6d45oSh5onu%0ALrahM3T4fB97WrtZ11w/tjdvJQKr6Oa1I/aik20tbhwKrmsa62BuqnRyyj1gK+k8MurDfXHYdkSw%0A0GFq/qleHDj++pFOHvbcxdnGO2Hz/4Z3YpD69WsGlc/mirpSvD5tT8rCcgQXzhhHUDNy2iSwo80z%0AyHYcjrHcgOQIMos0jKsURxAP7haj/VI+QfPeH2InSm/Ihs7Qxp2t5DoUd14T1FzkrEKjmJ3Xy5bD%0A9mzZdszNlZeVUVY0VvHTWOlkeNTH2Z7ovYPufhN52IkIGmYUsdjRymBOybha+D/sO0tpYR4V9/wI%0A5t4Mz3/Z/myFgCOYw2JrNsFBO5VDrhp0TgHnTx3mfU0zKew5ZiQdos2HvhTw5wkkR5BZZEOOQMBE%0ABDOaJndjFs80Gf9ERQQXI3cVj4z6+M07p7l1UTUVwd/Ec3JRziqWlA6y5XBH1OaqoZFRdp/qDuQH%0A/Myp8JeQRs8TtPeZfgM7EUFujoOl+W205jYGVEID2kJX1pJfUAif+DlULYINn4Wze6NeMzgiaKxw%0AUpDrsJcwdjgYcl5GydBZ1q9oMMOGLvWKIT/+PIHkCDILyRFkCe6W0MlER4750E5UjiCKztCmg+10%0A9g/ziRUNk58sqaWpsI+OPg8Honwg7jp5geFRH9dPSNA2VZoSUjt5grGuYhvDcLRmrm7lgK8+cOj1%0AI530DXn5D1dZ1UKFpfDpDVBYBk+unyQXPYnuU1BQBkXl5OY4WFBjP2F8arSSOY5Obl9SbTmCSzw/%0A4Kf0MigohfzidFsiBCM5gizA54v8rTGRekNRcgQbd7VRVVLAmgUhkn0ltVRazd/Rtoe2H3OT41Cs%0AaJwx7nhNSSGFeQ5bTWX+ruJwEtTj6GmlyHeRnQN1gYTuH/zaQsGS06WXGQnmkUHjDCKV5VoVQ34W%0A1ZXw3tneqNHQRY+XPf2lNOW6KR48b+rqL/VEsZ/VX4VP/lu6rRAmIjmCLKDvDHgHJyeK/SRSb2jA%0ADSgzDHsCHX0eXjnYzl3LZ4We/+uqIW+gncV1pWw5FNkRbGtxs2RWGSWF4zuC/SWksUQElTa2hjh/%0AAID3fA2c6BwIbAt9YHGIJrKaxXD3L0wU9vS9ppIqFN0nxzmCK2pLcV8cjio18cK+s5zwVuIa7TZy%0AFzB9tobKZ8Pctem2QpiI5AiygFBic8G4qhObIyiaETJx+dw7pxn1adY3h9gWArP/e7GDtQtmsuvk%0ABfqGRkKeNjDsZU9b96RtIT9zKoptdRe39w0xozjPXjdw+7sAHNYNHG3vZ+tRa1vo6jBzBJpugjse%0AhZOvw3NfNFFZMEE9BH4W1ZUCRJWa2LirDY/L2qJq2WTup8vWkJCZ5Ltg5OLk93kSEUcQK/7S0XAf%0AFokUnhtwh8wPaK3ZsLOV5bPLAyJuk3DVgPZxa4MDr0+z9WjoAew7T1xgZFRPShT7aax00to1GLWE%0AtL03/IjKSZw/gK+0nn5VzNH2fn6/9ywlhbmsnhehnv3q9XDrQ7D/WXj5W+OfG7xg9lRnjFVxLQpU%0ADoXPj5x0X+St410sWnyVOdDyCuQVxzf1SxAShV9vaCR120PiCGKl65iRhS4JIwHsrDIfSsOxTfcK%0AyUDoruI9bT0cae83VS7hsCpCri4fwlWQGzZPsO2Ym1yHYsWcydtPAE0VpoT0THfkEtKOfg/VJTYS%0AxQDtB3DUXMms8iLeO9trTSKzoS20+quw4n7Y+gN466djxy3RuOCIoLw4P6rUxDO72nAouPHa5eZA%0AlyU2Z2fesSAkizTMJBBHECvuoyY/4Ajzp0tkU1kYR7BhZyuFeQ4+EkmPx2oqyxtoZ9W8Cl4NU0a6%0A/ZibpQ3lOAtCDyaZU2Gvcsh2ROAdhs7DULOYedUuXnrvfORtoWCUgtu+Cwtugz9+HQ6+YI4HlY4G%0As6iuNGzl0KhP8+yuNlbPr6KmbrZRQAXZFhLST37q5xaLI4iVcKWjfvxdmklyBIPDozy/+wy3L6mb%0AlNwdhyUzQd851iyo5nT34KRO236Pl71tPaycG34e8lgJafgIR2ttRQQ2HIH7iBHmq76SeVUuvD4d%0AfVsomJxcWPcY1C2DZz4PbTvHHEHZ+AjpitqSsFITb7R0cqZniPXN9cbB+F8rjkBIN4FxleIIMpNR%0AL1w4HrmqxL+nP1VHoHVIR/Diu+fo83hZt6I+zAst/F2j/edZs9B8yE7cHtpxootRn+b6ueE7S2tK%0AC6KWkPYOehn2+uxFBFbFkD8iAOxtCwWT7zTS1SU18NQn4firpt+gaPyIykURpCY27myjtDCXDyy2%0A/k7+aGK6VAwJmUsaZhKII4iFnlPm22ykb40BmYkplpAO9Zi1JiSLN+5qpWFmESuboqhz5haYiqO+%0Ac8wqL2J+tWuSI9je4iYvR9EcJj8AZn5AY4UzoiPo6Le6iu04gvZ3jTxHxXyurjcf3OPkMeziqoJP%0APwvaZ2b4TpT7IDhhPH57qGdwhBffHZPtBsYSzdOlh0DIXCRHkOFEKx2FxOUIQjSTtXYNsPWom3XL%0AG3A4bCQ0XbXQfx6ANQuqePNYFwPD3sDT2465uaZhBkX5kXV1ovUStPdaIyrtRgSVCyA3n8WXlbLj%0Am+9n9fw4tW4q58GnfmX290P0dYSTmnh+j5HtXtccFFVVzDNqnJXz47NFEBJFIEdgU0o9ASTNESil%0AHldKtSul9gcd+5ZS6rRSard1uz1Z6yeFgOpohG+NeYWmbT9hjmDsQ/KFfUYi+ePNNr9Bl9RA3zkA%0A1iysYnjUx/Zj5rq9QyPsP93DyjBlo8FEKyH1N23ZqhryD6OxsF1yGo6G6+AvX4EPfWfSU+GkJjbu%0AamNBjYur68vGDjb/R7j/z0YvShDSSf6lFRH8DPhwiOPf11ovs24vJHH9xONuMXo20dQanZVTdwQB%0AwbmxD6bXj3ayoMZF/Qyb2jBBEcG1jTMpyssJdBnvON6FTxO2kSyYxoriiCWktiOCoR7oaR0bRpMo%0Aaq6EstA5k4lSE0cs2e71zQ3jR2rmO6G+ObF2CUI8XEo5Aq31q0BXsq6fFtxHoWJu9DpzZ/XUcwQT%0ABOc83lF2nOjihlgGupfUGEegNYV5OVx/eUUgT7CtxU1+roNrZpdHuYiJCCD8/OKOfg8FuQ5KC0OX%0AoAZof8/cV09h6HqMTJSa2LirjRyH4o548hKCkAousYggHH+tlNprbR2Fz1JmIl02h5o7K6cuPDcw%0AXoL67ZPdDI34WD0vBkfgqoXR4YBY25oFVZxwD3Ci8yLbjrlZPrt8LFkaAX8Jabj5xe29Q1SVRB5a%0AD8B5Iy2R8IggAsFSE16f5tdvn+bmhdVT35IShGSRk2vyXinMEUT5CpdwHgX+J6Ct+/8DfD7UiUqp%0AB4AHAGpqati8eXNcC/b398f92mAco8Pc2N3KybIbOBHlevN7vVR1n+aNONb12zu3ZTezHPm89sYO%0AAJ49MoxDwciZA2z2f7OOQlW7myuBHa88z0XXHIoGjHbJD57byoEzI9wxL8/W30ZrTX4OvLb7EA2e%0AE5NsPdw6SKEm6rXmH/4zNTnFvP5OC6hjtn6HqdI/bLaE/rD1HWbkDNPZr1hU2J2Q90QySdT7NlVk%0Ak73ZYOsNqoCOE0c4snlzSuxNqSPQWp/3P1ZK/RT4fYRzfwL8BGDFihV67dq1ca25efNm4n3tONoP%0Awmuaxub303h1lOvpN+DMi6y9cfXk4TVRCNh74VfQUx2w/QcHtrKsAW57/yr7FzuRBwe+x7WLGuBy%0Ac51H3t3ES63DaOCeW1eMG00Zibl7XsVbVMTatddOsvU7b2/h8lona9euiHyRY38Pl13N2ptvtv87%0AJIDv7HqZ4aKZ7Dx7ngpnDv9l3S3khVJszSAS9r5NEdlkb1bYurucWZWlzFq7NiX2pvR/g1IqWBPh%0ATmB/uHMzjoDYXBj56WCcVYAe2+ePhwF3YGh979AIe9t6xmv128E/gaov4H9Zs6CKPo+XwjwHSxvK%0AwrxwMk2V4UtIbekMaW16CFK4LeRnUV0pO05cYHf7KHdcMyvjnYAgUFByaeQIlFK/BLYBC5VSbUqp%0A+4HvKqX2KaX2AjcDX03W+gmny+ohsNNwlIhegoHOQH7gzWOmAzhmRxDoLj4XOOTvMm6eM4OCXPtz%0AeU0J6QDe0fHSuB7vKN0DI9HlJXrPmKqh6tQ7gitqSzjdPciohvXROrIFIRPId14aOQKt9T0hDj+W%0ArPWSjvuo+YAvil5lM6Y3NIXKoQF3wOlsPdpJYZ69Cp9xFLhMBUJQRHD93EpmOvP5wKKamC7VWFHM%0AyKjmTPcQsyvGylc7raH1UZOv7X5pidRVDPnxJ4wbSx1cUVua8vUFIWbyXeaLU4pIdbI4e3G32Jcf%0A8MtMTKVy6OKYztDWo51c11QR0zf4AK6acRFBUX4Obzx4C/kxbo80BqmQBjuC9l4jL1FdGsUR+CuG%0AqhfFtG4iuLq+DKVgTb283YUsId8JvadTtpxsltrFbbN0FMYazuLtJfB6YLgPnBWc7x3iSHs/q2x0%0AAIekpHZcRABQmJdjT6IiiHCD7P0jKqtcUXIE59+F0lkhx24mmzkVTl762hrWNogjELKESyVHcEnh%0A6TPfqu0kisEoYebkx58jCNIZeqPFRBUx5wf8TIgI4qWqpIDi/JxJTWWBofXRIoIJ0hKp5vIqV/Q+%0AB0HIFFKcIxBHYIcuq+bdbkSg1NRGVgbpDL1+xM2M4jwW18W5tx0iIogHpRRzKpycnDCXoL3Pg1JQ%0A4cwP/+LREeg4lJaKIUHISvJdEhFkHHbE5ibirIp/a8jKLejimbzR0skNl1fGvJUTwFVjZp8m4NtF%0AU2XxJDnqjj4PFc58ciPlHNxHwTeSUmkJQchq8p3m/4zXk5LlxBHYwW1FBCGkjsOSgIig1ePkbM9Q%0A/NtCELKXIF7mVDg5NaGEtKNviEqXzUSxRASCYI8CM0sjVVGBOAI7uI+aRGe+TdVPMCWkcTsCo9W3%0A/ZyRR1g1L85EMYTsJYiXpgonXp/mdJAKaUefh+rSKIni9gOW1v+CKdsgCNOCFM8kEEdgh64oc4pD%0A4ZeiDjEwPioDnYDilZMj1M8oYvbMGBzQRAIRwdQdQWOI+cXtfR6qokYEB8zAl1wRehMEWwSkqCUi%0AyBzcR2MfYeisNsqf8TSFDLjRRTN441g3qy6vnFq1S9Ds4qnSaPUP+PMEPq3p7PfYqBh6N60VQ4KQ%0AdRSkdoC9OIJoDHQZGedYh5oHZCbiaCq72Iknfwa9Q15WxTvG0U/RDMgpSEhEUFVSgDOohPTiCIyM%0A6sgRgacPuk9JfkAQYiFfHEFmceJ1cx9rR6zL7wjiqBwacNOFSRbdEG8jmR+lxgbUTJGxElLjCHo8%0AZtsrYkSQhmE0gpD1BHIE4ggygzd/BGWzoWlNbK+bivDcgJvTHidX1JZEr8ixg6s2IREB+FVITY6g%0A23IEESMCqRgShNhJ8bhKcQSROLMbTm6F9/2nmOcKBPSG4ugl0BfdHBsomFrZaDAJiggA5lQUB1RI%0AezymjDRi1VD7AfPtpmx2QtYXhGmBlI9mENsfMR9iyz8T+2uLKwAVe45Aa/SAmw5fSWxjKSORwIig%0AsXKshNS/NRRRefT8AbOt5pC3miDYxh8RSPlomuk9C/t/Ddfca7SDYiUnF4pnxpwjyPVexKG99FBq%0Ae3pYVEpqYKgbRoamfKmmoEH2PR5NcX4OroIw0ZJ/GI1UDAlCbOQWmt4biQjSzI5/AZ/XbAvFSxzd%0AxXkjvQCUVNTiDPcBGysuq5cgAdtDc4JKSLs9OvJAmr5zpuKqZsmU1xWEaYVSlt6Q5AjSx8gg7Hwc%0AFt4em6zERJxV0B+bI/AOmr6DhvoETtIqSZwjqHKZEtIT7gF6hnXkbaF2SRQLQtwUiCNIL3t/BYNd%0AcP1fTe06cchMdFzoBmBBU+PU1h5nh9VUloA8gVKKRmt+sYkIIiSKz1tTyWRrSBBiJ98p5aNpQ2vY%0A/ijUXgVzVk3tWnFsDXX1JMERJDAiADOt7ISVI4gcERyAkjqTKxEEITZSKEUtjmAiLa9Ax0FY+SWz%0ATzcVnFXg6Y0pSTvYb7aG8kqqprZ2MMWVJvGUsMqhYk51DTDojVYxJIliQYibfKdsDaWN7Y+YrZQl%0Ad039WjE2lZ3pHiR3pBevo3CsfCwROBxmmyoBCqRgIgKfpaUX1hF0txpHMKs5IWsKwrSjoEQcQVro%0AOARHX4Jrv5AYpUyXf4i9PUew9WgnFaoPnYytFFdNQmYSwJgKKRC+auitn5j75Z9NyJqCMO2QHEGa%0A2P6oEWhb8fnEXC/GiOCNFjdVjj5yE7kt5KekNqERgZ+QEYGnH3Y9AYs/CuUNCVlTEKYdl0KOQCn1%0AuFKqXSm1P+jYTKXUn5VSR6z7GclaP2YGumDP07D0k2aWQCLwO4IoMhOjPs2R8328frSTy3L7UMVT%0AFJoLRQIjgkpXfqCJLGTV0O6nwNNj8iyCIMRHCnMECepYCsnPgH8Gfh507EHgZa31w0qpB62f/zaJ%0ANthn17+CdxDe98XEXTNERKC15lzvEHtau9nd2sOe1m72ne6h3+MFoLKkzyR3E01JrbFj1Bu7btIE%0ATAlpMe+e7mXmxKH1Ph+8+SjMWgEN105pHUGY1hSUwMgA6NGkL5U0R6C1flUp1Tjh8MeAtdbjJ4DN%0AJNER/HHfWZ57z8Nr/QcinpfjG+Gv9z1CZ+l1/OptBw51EIdSOBRg3WttPsR92gxk8Vk/a8DnM/eh%0A+LqjmHf2HeTfuw/QemGAPa3dtPeZgdR5OYrFdaXctXwWS+vLWTa7nJIf9Vo6RQnGVQNo4wxK66Z8%0AubmVLs64+8hxTKisOvIidB2DdX835TUEYVpjFYzkjCZ/gH0yI4JQ1Gitz1qPzwE14U5USj0APABQ%0AU1PD5s2bY17suYMetrSNQNvxiOd9RG2lJLeTrw18jk2vtZgPfQjcB2zCVJROvI+0v3ZfrouOc608%0Adfo45QWKeeU5fLA+n7llDhpKHeQ5vEAn9HXStm+Ey0cHOdbey6k4ft9IVHR2chWwc8sf6C+JcchO%0ACG4s87FgoW/Sv8vS3d+hqKCCN9vL0An+HaZKf39/XO+jdJBNtkJ22ZstttadOcNCYKjXnXx7tdZJ%0AuwGNwP6gn7snPH/BznWam5t1vGzatCnyCT6f1j9eo/U/NWs9OhrylNFRn/b5fPEZ8NP3a/2z6K/I%0AhgAADM9JREFUv7B3bs8ZrR8q1XrHY/GtFYnWnebaB/+YsEtO+tue3WvWeO37CVsjkUR9L2QQ2WSr%0A1tllb9bYumeD1g+V6u1/+Le4LwHs1DY+Y1NdNXReKVUHYN3HMb4rwZzaDmfegZVfDCuV7HCo+OcG%0AO6vsS1EPWOclJUfgn12cmMqhkGx/FPKKofm+5K0hCNMFa25xzujUVYOjkeqtod8B9wEPW/e/Tepq%0Aezey4NAG6P11+HNO74LCclh6d3JscFVB21v2zh1wm/tk5Aj8g3LsVA4NdJnk+bV/CYWl9q7fdx72%0AbTR9A0WZUwwmCFmLlSPI9Q4mfamkOQKl1C8xieFKpVQb8BDGAWxQSt0PnAQ+kaz1AWg/QIV7B/RF%0AaQ5b87eJ7eQNxlllPuB9o+DIiXyuP3JIVPlqMLn5xsHYiQhe/0d44//B8dfg0xshJy/6a3Y+DqPD%0Aia26EoTpTGGZJSEfrhQlcSSzauieME/dmqw1J/H+h9iWu4a1a9embMlJOKtB+8y3bFeURrGBLnOf%0AjIgArEllUSICTz/s+rmR3z62CZ7/Mnzsh5F1l0aGzPyG+R+CyqknogVBAOqWwt8cojsFiW3pLE42%0A/m/3drqLBzrRqORtrZTURI8Idj9pmsHu+qmJlHY/CZsfjvya/c+Y/MZUZbsFQUgLqc4RTD8CekPt%0AQBQlzgE33lwXedG2kOK2pRY6Dod/3uczCd/6a6F+hRGM62mDLQ9DWX3o2c1aw7ZHoPpKaFqTHLsF%0AQUgqEhEkm0B3sY3KoYudDOfHMR/ZLiU1ZiaBDrPnePhPcOE4rLS+2SsFf/EDuPwWs0V09KXJrzm+%0AxUwiW/nFqct2C4KQFsQRJBubekMADLgZybNZpRMPrlrwjYzlIiay/REorYdFHx07lpMH658wcwU2%0A3Adn90x4zaOm3PWq9cmzWxCEpCKOINkUzQBHrs0cQZIdgb+XoO/s5OfO7oUTr8H7HpisRVRYaqqH%0ACsvhyfXQfQqAooHTJoq49n7IizCyUhCEjEYcQbJRymoqy5CIAEInjLc/CnnO8PMDSuvg3mdMhdAv%0A1sHgBerbnoecfFhxf/JsFgQh6YgjSAV2uou1TmFEMKGEtO+8qfxZ9qnIFUvVi+DuJ00e4am7qT33%0ACixZN3ZdQRCyEnEEqcBZFT1H0N8OPm96IoKdj8HoiEn4RqPpRrjjUWjdTo7PIyWjgnAJIOWjqcBV%0ADZ1Hwj/v9cCz94Mjj+7yJcmzI78YCkrHRwQjQ7DjMVjwYai43N51rloHIwOc3L2FObVXJcdWQRBS%0AhjiCVOCsNDkCrSeXWGoNv/3PJlF750/ov5DkbZaJIyv3bTTNYHaigWCWf5bjvbOZk1jrBEFIA7I1%0AlAqc1eAdCj127uVvw74NcMt/M2Myk03wyEqtTZK4Zgk03ZT8tQVByEjEEaSCcL0EOx4zAm/Nn4Mb%0A/2tqbAmOCALNYH8lzWCCMI0RR5AKXJNnF3Poj/DC35i9+du/l7oPYn9E4JeGcFbBko+nZm1BEDIS%0AcQSpYOIQ+9O74JnPG3XBdY9PeZh8TJTUgncQzrxt5gtf+wVpBhOEaY44glTgHwrT324Guz/5CeMc%0APrUheXMQwuEvIX3pW1Yz2OdTu74gCBmHVA2lAr8Udcch2PZD0KNw77NjyqSpxN/8dfxVWHZvemwQ%0ABCGjEEeQCnLyTMfuWz+GnAK473dQOT89tvgjAoi9ZFQQhEsS2RpKFc4qQMHHfwqzV6bPDn9E0HQT%0A1CaxeU0QhKxBIoJUsfqrRoV08cfSa0dBKdz8TVh4W3rtEAQhYxBHkCqWfSrdFhiUgjVfT7cVgiBk%0AELI1JAiCMM0RRyAIgjDNEUcgCIIwzUlLjkApdQLoA0YBr9Z6RTrsEARBENKbLL5Zax1lbJcgCIKQ%0AbGRrSBAEYZqjtNapX1Sp40APZmvox1rrn4Q45wHgAYCamprmp59+Oq61+vv7cblcU7A2tWSTvdlk%0AK2SXvdlkK2SXvdlkK0zN3ptvvnmXra13rXXKb8As674a2APcFOn85uZmHS+bNm2K+7XpIJvszSZb%0Atc4ue7PJVq2zy95sslXrqdkL7NQ2PpPTEhEEo5T6FtCvtf5ehHM6gJNxLlEJZFMuIpvszSZbIbvs%0AzSZbIbvszSZbYWr2ztFaV0U7KeXJYqWUE3Borfusxx8Evh3pNXZ+kQjr7dRZVJWUTfZmk62QXfZm%0Ak62QXfZmk62QGnvTUTVUA/xGmYlcucBTWus/pcEOQRAEgTQ4Aq31MWBpqtcVBEEQQjMdykcnVSRl%0AONlkbzbZCtllbzbZCtllbzbZCimwN+3JYkEQBCG9TIeIQBAEQYiAOAJBEIRpziXtCJRSH1ZKHVJK%0AHVVKPZgmGx5XSrUrpfYHHZuplPqzUuqIdT8j6LlvWPYeUkp9KOh4s1Jqn/XcPymr7CrBtjYopTYp%0ApQ4opd5VSn05w+0tVEq9pZTaY9n7PzLZXmudHKXUO0qp32eBrSesdXYrpXZmsr1KqXKl1DNKqYNK%0AqfeUUtdnsK0Lrb+p/9arlPpKWu2103WWjTcgB2gB5gL5mA7mxWmw4yZgObA/6Nh3gQetxw8Cf289%0AXmzZWQA0WfbnWM+9BawEFPBH4LYk2FoHLLcelwCHLZsy1V4FuKzHecCb1poZaa+1zteAp4DfZ/J7%0AwVrnBFA54VhG2gs8AXzBepwPlGeqrRPszgHOAXPSaW/SfsF034DrgReDfv4G8I002dLIeEdwCKiz%0AHtcBh0LZCLxo/R51wMGg4/dgNJqSbfdvgQ9kg71AMfA28L5MtReoB14GbmHMEWSkrda1TzDZEWSc%0AvUAZcByr+CWTbQ1h+weBrem291LeGpoFtAb93GYdywRqtNZnrcfnME12EN7mWdbjiceThlKqEbgG%0A8y07Y+21tlp2A+3An7XWmWzv/wW+DviCjmWqrQAaeEkptUsZEchMtbcJ6AD+1dp2+xdlVAsy0daJ%0A3A380nqcNnsvZUeQFWjjyjOqhlcp5QKeBb6ite4Nfi7T7NVaj2qtl2G+bV+nlFoy4fmMsFcp9RGg%0AXWu9K9w5mWJrEKutv+1twJeUUjcFP5lB9uZitl8f1VpfA1zEbK0EyCBbAyil8oGPAhsnPpdqey9l%0AR3AaaAj6ud46lgmcV0rVAVj37dbxcDafth5PPJ5wlFJ5GCfwpNb615lurx+tdTewCfhwhtq7Cvio%0AMtP5ngZuUUr9IkNtBUBrfdq6bwd+A1yXofa2AW1WNAjwDMYxZKKtwdwGvK21Pm/9nDZ7L2VHsAOY%0Ar5Rqsjzv3cDv0myTn98B91mP78PsxfuP362UKlBKNQHzgbescLFXKbXSqgr4bNBrEoZ17ceA97TW%0A/5gF9lYppcqtx0WYfMbBTLRXa/0NrXW91roR8158RWt9bybaCkYcUilV4n+M2cven4n2aq3PAa1K%0AqYXWoVuBA5lo6wTuYWxbyG9XeuxNZiIk3TfgdkzlSwvwzTTZ8EvgLDCC+eZyP1CBSRoeAV4CZgad%0A/03L3kMEVQAAKzD/EVuAf2ZCYixBtq7GhKN7gd3W7fYMtvdq4B3L3v3Af7eOZ6S9QWutZSxZnJG2%0AYqrt9li3d/3/fzLY3mXATuu98BwwI1NttdZxAm6gLOhY2uwViQlBEIRpzqW8NSQIgiDYQByBIAjC%0ANEccgSAIwjRHHIEgCMI0RxyBIAjCNEccgSAkAaXUWmUpjApCpiOOQBAEYZojjkCY1iil7lVmpsFu%0ApdSPLRG7fqXU95WZcfCyUqrKOneZUmq7UmqvUuo3fr14pdQ8pdRLysxFeFspdbl1eZca08h/0q8V%0Ar5R6WJmZD3uVUt9L068uCAHEEQjTFqXUIuCTwCptxNVGgU9juj53aq2vBLYAD1kv+Tnwt1rrq4F9%0AQcefBH6otV4K3IDpJAej3voVjJ78XGCVUqoCuBO40rrO/0rubykI0RFHIExnbgWagR2WlPWtmA9s%0AH/Ar65xfAKuVUmVAudZ6i3X8CeAmS49nltb6NwBa6yGt9YB1zlta6zattQ8j19EI9ABDwGNKqbsA%0A/7mCkDbEEQjTGQU8obVeZt0Waq2/FeK8eHVYPEGPR4FcrbUXo+L5DPAR4E9xXlsQEoY4AmE68zKw%0ATilVDYF5vHMw/y/WWed8Cnhda90DXFBK3Wgd/wywRWvdB7Qppe6wrlGglCoOt6A166FMa/0C8FVg%0AaTJ+MUGIhdx0GyAI6UJrfUAp9XfAvyulHBiF2C9hBptcZz3XjskjgJEG/pH1QX8M+Jx1/DPAj5VS%0A37ausT7CsiXAb5VShZiI5GsJ/rUEIWZEfVQQJqCU6tdau9JthyCkCtkaEgRBmOZIRCAIgjDNkYhA%0AEARhmiOOQBAEYZojjkAQBGGaI45AEARhmiOOQBAEYZrz/wGeqe+kX79t/gAAAABJRU5ErkJggg==
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation.

  1. There is definitely an improvement by using Tanh and using the Hinge Loss Function.
  2. An improvement in comparison to varying only the acivation function.
  3. Increasing the number of epochs and training with this option could be a feasable option to try out. There is an improvement accuracy with the increase in epochs.
  4. Hence, Hinge Loss with the combination of Tanh is a feasable option

In this case though it did not reach the bench mark accuracy , it showed a substantial improvement and we can definitely evelaute its performance for larger number of epochs

Highest Accuracy Train Accuracy = 34.38% Test Accuracy=32.52%

Next, we will try the Hinge Loss without applying the reduce_mean in the cost with sigmoid activation function

Hyper Parameter Tuning

Loss Function - Hing Loss without Reduce_Mean and activation function Sigmoid

Now we will try the loss function hinge loss ut without he reduce_mean function applied to the cost. Will also change the activation to Sigmoid.

Resuse the code from the Initial Model, and ensure the function named model_lenet5 has the activation function Sigmoid. Follow the comments to see the change

In [53]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation. Ste activation function to Sigmoid
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Reuse the code from the initla model and repalce the cost function to Hinge_lossand remove the reduce mean function . Observe the comments clasely to make the change

In [54]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)
    
# change the softmax cross entropy to hine_loss and ensure to remove the reduce_mean function
    #4. then we compute the hinge_loss between the logits and the (actual) labels
    loss = tf.losses.hinge_loss(logits=logits, labels=tf_train_labels)
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code for initializing the tensorflow session from the first model. Ensure num_steps=7001

In [55]:
train=[]
test=[]
display=[]
### running the tensorflow session
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:10.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is       1.55 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 0200 : loss is       0.25 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0400 : loss is       0.22 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 0600 : loss is       0.21 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0800 : loss is       0.21 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1000 : loss is       0.21 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 1200 : loss is       0.22 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 1400 : loss is       0.21 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 1600 : loss is       0.21 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1800 : loss is       0.21 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 2000 : loss is       0.21 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 2200 : loss is       0.21 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 2400 : loss is       0.21 , accuracy on training set 3.12 %, accuracy on test set 10.00 %
step 2600 : loss is       0.21 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 2800 : loss is       0.21 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3000 : loss is       0.21 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 3200 : loss is       0.21 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 3400 : loss is       0.21 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 3600 : loss is       0.21 , accuracy on training set 1.56 %, accuracy on test set 10.00 %
step 3800 : loss is       0.21 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 4000 : loss is       0.21 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 4200 : loss is       0.21 , accuracy on training set 3.12 %, accuracy on test set 10.00 %
step 4400 : loss is       0.21 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 4600 : loss is       0.21 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 4800 : loss is       0.21 , accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 5000 : loss is       0.21 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 5200 : loss is       0.20 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 5400 : loss is       0.20 , accuracy on training set 17.19 %, accuracy on test set 10.00 %
step 5600 : loss is       0.21 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 5800 : loss is       0.21 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 6000 : loss is       0.21 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 6200 : loss is       0.21 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 6400 : loss is       0.21 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 6600 : loss is       0.21 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 6800 : loss is       0.21 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 7000 : loss is       0.21 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
In [57]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation :

Removing the reduce_mean function byu applying the hinge loss had an effect on the accuracy. The testing accuracy did not improve over 10%. Moreover this also affected as the network did not plateau as well. Hence, the reduce_mean function was required to improve the accuracy. Though at this stage the Gradient Descent Optimizer worked out the best.

Highest Train Accuracy= 17%

Highest Test Accuracy=10%

Final Observation of Loss Function

  1. Though the Softmax Cross Entropy with Logits worked the best, it is prospective to try Hinge Loss with an activation of the Tanh.

  2. The Hingle loss with the reduce mean funtion with the Tanh Function works the best. Hence, it is a prospective parameter apart from Softmax Cross Entropy with Logits and the sigmoid function

Next, lets observe the effect of number of epochs on the CNN MOdel. Though it is eveident that we need to train the CIFAR 10 model further to improve accuraciesto 80 to 90 % , it will be intuitive to check if increasing the number of epochs indeed increses the accuracy.

Hyper Parameter Tuning with Number of Epochs for CNN

Number of Epochs =10000

The model was trained for 10000 . The model design is the same as the first model with only changes in number of epochs.

Start by using the functions to initialize functions for intilaizing the weights and layers with weighsta nd bias. Make sure the activation is set back to Sigmoid to avaoid errors

In [64]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Next, use the code of the intial model to intialize the hyper parameters. Ensure Number of Epochs to 10001. Follow the comments to see the change made.

In [65]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10


#number of iterations and learning rate
# set the num_steps to 10000
num_steps = 10000
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Use the same code as the initla model to initialize the tensorflow session

In [66]:
train=[]
test=[]
display=[]
#number of iterations and learning rate

### running the tensorflow session
with tf.Session(graph=graph) as session:
    
    tf.global_variables_initializer().run()
    print('Initialized with epochs', num_steps)
    
    
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with epochs 10000
step 0000 : loss is 002.63 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 0200 : loss is 002.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0400 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0600 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0800 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1000 : loss is 002.19 , accuracy on training set 15.62 %, accuracy on test set 15.04 %
step 1200 : loss is 002.21 , accuracy on training set 17.19 %, accuracy on test set 15.58 %
step 1400 : loss is 002.24 , accuracy on training set 12.50 %, accuracy on test set 17.70 %
step 1600 : loss is 002.10 , accuracy on training set 23.44 %, accuracy on test set 19.10 %
step 1800 : loss is 002.20 , accuracy on training set 18.75 %, accuracy on test set 21.85 %
step 2000 : loss is 002.00 , accuracy on training set 29.69 %, accuracy on test set 23.25 %
step 2200 : loss is 002.04 , accuracy on training set 21.88 %, accuracy on test set 23.88 %
step 2400 : loss is 002.07 , accuracy on training set 28.12 %, accuracy on test set 25.21 %
step 2600 : loss is 001.82 , accuracy on training set 31.25 %, accuracy on test set 24.62 %
step 2800 : loss is 001.98 , accuracy on training set 20.31 %, accuracy on test set 28.13 %
step 3000 : loss is 002.00 , accuracy on training set 23.44 %, accuracy on test set 29.98 %
step 3200 : loss is 002.18 , accuracy on training set 21.88 %, accuracy on test set 30.01 %
step 3400 : loss is 002.04 , accuracy on training set 23.44 %, accuracy on test set 30.39 %
step 3600 : loss is 001.98 , accuracy on training set 23.44 %, accuracy on test set 30.74 %
step 3800 : loss is 001.97 , accuracy on training set 28.12 %, accuracy on test set 31.20 %
step 4000 : loss is 001.95 , accuracy on training set 26.56 %, accuracy on test set 31.52 %
step 4200 : loss is 001.77 , accuracy on training set 39.06 %, accuracy on test set 34.07 %
step 4400 : loss is 001.86 , accuracy on training set 26.56 %, accuracy on test set 34.71 %
step 4600 : loss is 001.81 , accuracy on training set 34.38 %, accuracy on test set 34.34 %
step 4800 : loss is 001.73 , accuracy on training set 43.75 %, accuracy on test set 36.80 %
step 5000 : loss is 001.79 , accuracy on training set 35.94 %, accuracy on test set 36.84 %
step 5200 : loss is 001.79 , accuracy on training set 32.81 %, accuracy on test set 37.05 %
step 5400 : loss is 001.66 , accuracy on training set 37.50 %, accuracy on test set 36.74 %
step 5600 : loss is 001.84 , accuracy on training set 34.38 %, accuracy on test set 34.33 %
step 5800 : loss is 001.63 , accuracy on training set 42.19 %, accuracy on test set 38.33 %
step 6000 : loss is 002.02 , accuracy on training set 29.69 %, accuracy on test set 37.97 %
step 6200 : loss is 001.84 , accuracy on training set 37.50 %, accuracy on test set 39.63 %
step 6400 : loss is 001.75 , accuracy on training set 35.94 %, accuracy on test set 38.19 %
step 6600 : loss is 001.62 , accuracy on training set 40.62 %, accuracy on test set 38.66 %
step 6800 : loss is 002.00 , accuracy on training set 34.38 %, accuracy on test set 40.66 %
step 7000 : loss is 001.81 , accuracy on training set 37.50 %, accuracy on test set 38.01 %
step 7200 : loss is 001.63 , accuracy on training set 48.44 %, accuracy on test set 39.43 %
step 7400 : loss is 001.86 , accuracy on training set 32.81 %, accuracy on test set 38.82 %
step 7600 : loss is 001.95 , accuracy on training set 28.12 %, accuracy on test set 41.19 %
step 7800 : loss is 001.50 , accuracy on training set 37.50 %, accuracy on test set 42.76 %
step 8000 : loss is 001.34 , accuracy on training set 59.38 %, accuracy on test set 41.93 %
step 8200 : loss is 001.60 , accuracy on training set 34.38 %, accuracy on test set 39.07 %
step 8400 : loss is 001.52 , accuracy on training set 37.50 %, accuracy on test set 43.74 %
step 8600 : loss is 001.54 , accuracy on training set 45.31 %, accuracy on test set 42.01 %
step 8800 : loss is 001.52 , accuracy on training set 45.31 %, accuracy on test set 43.81 %
step 9000 : loss is 001.45 , accuracy on training set 45.31 %, accuracy on test set 41.43 %
step 9200 : loss is 001.40 , accuracy on training set 51.56 %, accuracy on test set 43.66 %
step 9400 : loss is 001.70 , accuracy on training set 45.31 %, accuracy on test set 43.53 %
step 9600 : loss is 001.64 , accuracy on training set 39.06 %, accuracy on test set 43.09 %
step 9800 : loss is 001.57 , accuracy on training set 45.31 %, accuracy on test set 42.97 %

Lets plot the accuracy vs number of epochs to note changes

In [68]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation

It is clearly observed that the accuracy increases with the increase in number of epochs. There are instances where the network is lost in various parts of the loss scenarios and hence we see a random drop in accuracy from 50% to 30% owning to random seed and the loss scenario. We also observe a uniform rise in in testing and training accuracy which is a good sign. There is definitely a scope to train the network to see a the network plateau.

But it is clearly visible that a rise in number of epochs will defintely cause a rise in accuracy and the network should eventually plateau

Highest

Train Accuracy = 60%

Test Accuracy= 43%

Hyper Parameter Tuning the Number of Epochs for CNN

Lets select Number of Epochs - 15000. The model was trained for 15000 epochs.

Reuse the code for the initla model to assign weights and bias to the network

In [72]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Reuse the code to initialize hyper parameters same as the initla model. Moify num_steps=15000. Follow the comments to see the change made

In [73]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10


#number of iterations and learning rate
# Change num_steps=15001
num_steps = 15001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code to initialize the tensorflow session from the intial model.

In [74]:
train=[]
test=[]
display=[]
#number of iterations and learning rate

### running the tensorflow session
with tf.Session(graph=graph) as session:
    
    tf.global_variables_initializer().run()
    print('Initialized with epochs', num_steps)
    
    
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with epochs 15001
step 0000 : loss is 002.27 , accuracy on training set 20.31 %, accuracy on test set 10.00 %
step 0200 : loss is 002.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0400 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0600 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0800 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1000 : loss is 002.28 , accuracy on training set 14.06 %, accuracy on test set 14.28 %
step 1200 : loss is 002.21 , accuracy on training set 17.19 %, accuracy on test set 15.26 %
step 1400 : loss is 002.24 , accuracy on training set 14.06 %, accuracy on test set 16.29 %
step 1600 : loss is 002.12 , accuracy on training set 25.00 %, accuracy on test set 18.43 %
step 1800 : loss is 002.22 , accuracy on training set 15.62 %, accuracy on test set 19.93 %
step 2000 : loss is 002.02 , accuracy on training set 31.25 %, accuracy on test set 20.85 %
step 2200 : loss is 002.06 , accuracy on training set 17.19 %, accuracy on test set 22.03 %
step 2400 : loss is 002.00 , accuracy on training set 18.75 %, accuracy on test set 26.21 %
step 2600 : loss is 001.84 , accuracy on training set 26.56 %, accuracy on test set 26.24 %
step 2800 : loss is 001.97 , accuracy on training set 21.88 %, accuracy on test set 27.35 %
step 3000 : loss is 001.93 , accuracy on training set 26.56 %, accuracy on test set 30.60 %
step 3200 : loss is 002.09 , accuracy on training set 25.00 %, accuracy on test set 28.07 %
step 3400 : loss is 001.85 , accuracy on training set 39.06 %, accuracy on test set 24.66 %
step 3600 : loss is 001.98 , accuracy on training set 32.81 %, accuracy on test set 27.22 %
step 3800 : loss is 001.98 , accuracy on training set 26.56 %, accuracy on test set 33.52 %
step 4000 : loss is 001.95 , accuracy on training set 31.25 %, accuracy on test set 34.83 %
step 4200 : loss is 001.74 , accuracy on training set 32.81 %, accuracy on test set 36.57 %
step 4400 : loss is 001.77 , accuracy on training set 31.25 %, accuracy on test set 36.78 %
step 4600 : loss is 001.81 , accuracy on training set 29.69 %, accuracy on test set 35.35 %
step 4800 : loss is 001.67 , accuracy on training set 40.62 %, accuracy on test set 36.96 %
step 5000 : loss is 001.86 , accuracy on training set 34.38 %, accuracy on test set 37.96 %
step 5200 : loss is 001.85 , accuracy on training set 23.44 %, accuracy on test set 36.68 %
step 5400 : loss is 001.56 , accuracy on training set 39.06 %, accuracy on test set 38.82 %
step 5600 : loss is 001.97 , accuracy on training set 29.69 %, accuracy on test set 37.82 %
step 5800 : loss is 001.55 , accuracy on training set 40.62 %, accuracy on test set 38.03 %
step 6000 : loss is 001.81 , accuracy on training set 39.06 %, accuracy on test set 38.53 %
step 6200 : loss is 001.79 , accuracy on training set 35.94 %, accuracy on test set 40.89 %
step 6400 : loss is 001.83 , accuracy on training set 35.94 %, accuracy on test set 41.36 %
step 6600 : loss is 001.59 , accuracy on training set 48.44 %, accuracy on test set 42.52 %
step 6800 : loss is 002.00 , accuracy on training set 31.25 %, accuracy on test set 42.33 %
step 7000 : loss is 001.73 , accuracy on training set 34.38 %, accuracy on test set 39.17 %
step 7200 : loss is 001.61 , accuracy on training set 48.44 %, accuracy on test set 40.09 %
step 7400 : loss is 001.86 , accuracy on training set 35.94 %, accuracy on test set 41.82 %
step 7600 : loss is 001.83 , accuracy on training set 31.25 %, accuracy on test set 43.37 %
step 7800 : loss is 001.49 , accuracy on training set 40.62 %, accuracy on test set 42.53 %
step 8000 : loss is 001.34 , accuracy on training set 54.69 %, accuracy on test set 42.37 %
step 8200 : loss is 001.54 , accuracy on training set 40.62 %, accuracy on test set 42.87 %
step 8400 : loss is 001.42 , accuracy on training set 46.88 %, accuracy on test set 43.64 %
step 8600 : loss is 001.63 , accuracy on training set 40.62 %, accuracy on test set 43.32 %
step 8800 : loss is 001.42 , accuracy on training set 39.06 %, accuracy on test set 44.35 %
step 9000 : loss is 001.44 , accuracy on training set 40.62 %, accuracy on test set 44.25 %
step 9200 : loss is 001.45 , accuracy on training set 53.12 %, accuracy on test set 45.19 %
step 9400 : loss is 001.71 , accuracy on training set 35.94 %, accuracy on test set 42.87 %
step 9600 : loss is 001.60 , accuracy on training set 40.62 %, accuracy on test set 43.80 %
step 9800 : loss is 001.56 , accuracy on training set 42.19 %, accuracy on test set 45.15 %
step 10000 : loss is 001.75 , accuracy on training set 31.25 %, accuracy on test set 42.02 %
step 10200 : loss is 001.50 , accuracy on training set 53.12 %, accuracy on test set 44.28 %
step 10400 : loss is 001.50 , accuracy on training set 45.31 %, accuracy on test set 44.81 %
step 10600 : loss is 001.40 , accuracy on training set 43.75 %, accuracy on test set 46.62 %
step 10800 : loss is 001.59 , accuracy on training set 43.75 %, accuracy on test set 45.49 %
step 11000 : loss is 001.43 , accuracy on training set 48.44 %, accuracy on test set 45.59 %
step 11200 : loss is 001.56 , accuracy on training set 45.31 %, accuracy on test set 45.73 %
step 11400 : loss is 001.56 , accuracy on training set 39.06 %, accuracy on test set 45.11 %
step 11600 : loss is 001.90 , accuracy on training set 29.69 %, accuracy on test set 44.65 %
step 11800 : loss is 001.56 , accuracy on training set 48.44 %, accuracy on test set 45.13 %
step 12000 : loss is 001.40 , accuracy on training set 54.69 %, accuracy on test set 45.70 %
step 12200 : loss is 001.59 , accuracy on training set 40.62 %, accuracy on test set 45.97 %
step 12400 : loss is 001.39 , accuracy on training set 50.00 %, accuracy on test set 44.78 %
step 12600 : loss is 001.55 , accuracy on training set 43.75 %, accuracy on test set 46.42 %
step 12800 : loss is 001.59 , accuracy on training set 34.38 %, accuracy on test set 45.76 %
step 13000 : loss is 001.39 , accuracy on training set 45.31 %, accuracy on test set 46.42 %
step 13200 : loss is 001.45 , accuracy on training set 59.38 %, accuracy on test set 47.02 %
step 13400 : loss is 001.70 , accuracy on training set 32.81 %, accuracy on test set 42.29 %
step 13600 : loss is 001.32 , accuracy on training set 48.44 %, accuracy on test set 44.00 %
step 13800 : loss is 001.57 , accuracy on training set 35.94 %, accuracy on test set 46.46 %
step 14000 : loss is 001.44 , accuracy on training set 45.31 %, accuracy on test set 46.26 %
step 14200 : loss is 001.54 , accuracy on training set 50.00 %, accuracy on test set 45.88 %
step 14400 : loss is 001.49 , accuracy on training set 57.81 %, accuracy on test set 47.58 %
step 14600 : loss is 001.61 , accuracy on training set 40.62 %, accuracy on test set 42.13 %
step 14800 : loss is 001.66 , accuracy on training set 40.62 %, accuracy on test set 46.67 %
step 15000 : loss is 001.35 , accuracy on training set 54.69 %, accuracy on test set 47.35 %

Let plot the graph to see the accuracy

In [75]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation:

In Comparison to the 10000 epochs , there is some stability in the testing and training accuracies in 15000 epochs. This is because in the case of the 10000 epochs in the last epoch there is a drastic diffrence in accuracies between the training and testing accuracies. In comparison to that there is some stability in this case. The network yet looses its way in the loss scenario at around 6000 epochs and around 8000 epochs again with noted change in the accuracy.More iterations might see the network inprove its accuracy. From the references , it is clearly visible that CIFAR 10 can reach an accuracy of around 80% by training it to 150000 epochs.

Highest Train Accuracy =60%

Test Accuracy =48%

though similar accuracies as the previous tuning results but with stability between the train and test accuracies

Hyper Parameter Tuning for Gradient Estimation for CNN

Gradient Estimation

The network is tuned for Adam Optimizer and the Adagrad Optimizer.

Lets tune the gradient estimation for the Adam Optimizer.

Reuse the code from the initial model for intilaizing the weights and bias

In [60]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Intilaize the hyer parametr from the earlier model and change the optimizer to AdamOptimizer in the hyper paramete. Follow comments to observe the change made.

In [61]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10


#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    # Change to Adam Optimizer here
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Intilialize the session as per the previous model and run the model

In [62]:
train=[]
test=[]
display=[]
#number of iterations and learning rate

### running the tensorflow session
with tf.Session(graph=graph) as session:
    
    tf.global_variables_initializer().run()
    print('Initialized with epochs', num_steps)
    
    
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with epochs 7001
step 0000 : loss is 002.53 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0200 : loss is 003.80 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 0400 : loss is 003.69 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0600 : loss is 002.92 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 0800 : loss is 006.60 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 1000 : loss is 003.84 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 1200 : loss is 004.33 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 1400 : loss is 004.36 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 1600 : loss is 002.55 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1800 : loss is 003.37 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 2000 : loss is 002.72 , accuracy on training set 17.19 %, accuracy on test set 10.00 %
step 2200 : loss is 003.39 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 2400 : loss is 007.66 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 2600 : loss is 006.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 2800 : loss is 002.64 , accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 3000 : loss is 002.96 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 3200 : loss is 003.17 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 3400 : loss is 005.37 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 3600 : loss is 003.88 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3800 : loss is 004.12 , accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 4000 : loss is 005.22 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 4200 : loss is 003.97 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 4400 : loss is 003.46 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 4600 : loss is 004.83 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 4800 : loss is 005.08 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 5000 : loss is 002.73 , accuracy on training set 15.62 %, accuracy on test set 10.00 %
step 5200 : loss is 003.62 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 5400 : loss is 003.78 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 5600 : loss is 003.48 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 5800 : loss is 002.75 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 6000 : loss is 003.34 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 6200 : loss is 003.98 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 6400 : loss is 006.36 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 6600 : loss is 002.47 , accuracy on training set 17.19 %, accuracy on test set 10.00 %
step 6800 : loss is 003.73 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 7000 : loss is 005.81 , accuracy on training set 9.38 %, accuracy on test set 10.00 %

Plot the accuracy results vs the epochs

In [63]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation

It is observed that with the Adam Optimizer there is no increase in test accuracy. The Training and Testing accuracies do not improve comsistently hence, the network would plateau after a considerable number of epochs . But considering the 7000 epochs benchmark with the Gradient Descdent Optimizer there was no improvement in changeing the optimizer to Adam Optimizer.

Highest Train Accuracy = 17% Test Accuracy=10%

Hyper Parameter Tuning the Gradient Estimation for CNN

Gradient Estimation

Lets use the Adadelta Gradient Estimation.

Resuse the code from the initla model to initlaize weights and bias

In [60]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Reuse the code to set the hyper parameters from the intial mode. Change the Optimizer to Adadelta. Observe the comments to make the change

In [61]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10


#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    # change the optimizer to tf.nn.AdadeltaOptimizer
    optimizer = tf.train.AdadeltaOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))
In [65]:
#Create lists to store the values forplotting
train=[]
test=[]
display=[]

### running the tensorflow session
with tf.Session(graph=graph) as session:
    
    tf.global_variables_initializer().run()
    print('Initialized with epochs', num_steps)
    
    
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with epochs 7001
step 0000 : loss is 002.58 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0200 : loss is 002.30 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 0400 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0600 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0800 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1000 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1200 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 10.02 %
step 1400 : loss is 002.28 , accuracy on training set 21.88 %, accuracy on test set 17.02 %
step 1600 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 15.16 %
step 1800 : loss is 002.36 , accuracy on training set 6.25 %, accuracy on test set 15.71 %
step 2000 : loss is 002.24 , accuracy on training set 21.88 %, accuracy on test set 18.11 %
step 2200 : loss is 002.17 , accuracy on training set 15.62 %, accuracy on test set 18.08 %
step 2400 : loss is 002.17 , accuracy on training set 26.56 %, accuracy on test set 19.15 %
step 2600 : loss is 002.08 , accuracy on training set 21.88 %, accuracy on test set 23.43 %
step 2800 : loss is 002.16 , accuracy on training set 17.19 %, accuracy on test set 22.48 %
step 3000 : loss is 002.20 , accuracy on training set 23.44 %, accuracy on test set 24.98 %
step 3200 : loss is 002.02 , accuracy on training set 20.31 %, accuracy on test set 23.90 %
step 3400 : loss is 002.05 , accuracy on training set 25.00 %, accuracy on test set 26.83 %
step 3600 : loss is 002.07 , accuracy on training set 25.00 %, accuracy on test set 27.00 %
step 3800 : loss is 002.04 , accuracy on training set 28.12 %, accuracy on test set 26.71 %
step 4000 : loss is 002.09 , accuracy on training set 34.38 %, accuracy on test set 27.43 %
step 4200 : loss is 002.07 , accuracy on training set 20.31 %, accuracy on test set 27.20 %
step 4400 : loss is 001.88 , accuracy on training set 32.81 %, accuracy on test set 27.71 %
step 4600 : loss is 001.92 , accuracy on training set 29.69 %, accuracy on test set 26.62 %
step 4800 : loss is 002.10 , accuracy on training set 26.56 %, accuracy on test set 27.31 %
step 5000 : loss is 002.04 , accuracy on training set 35.94 %, accuracy on test set 28.26 %
step 5200 : loss is 001.96 , accuracy on training set 31.25 %, accuracy on test set 27.89 %
step 5400 : loss is 001.93 , accuracy on training set 31.25 %, accuracy on test set 28.65 %
step 5600 : loss is 002.12 , accuracy on training set 21.88 %, accuracy on test set 28.47 %
step 5800 : loss is 002.00 , accuracy on training set 28.12 %, accuracy on test set 28.49 %
step 6000 : loss is 001.97 , accuracy on training set 25.00 %, accuracy on test set 28.35 %
step 6200 : loss is 002.01 , accuracy on training set 29.69 %, accuracy on test set 29.60 %
step 6400 : loss is 001.96 , accuracy on training set 29.69 %, accuracy on test set 29.26 %
step 6600 : loss is 001.85 , accuracy on training set 34.38 %, accuracy on test set 29.77 %
step 6800 : loss is 001.73 , accuracy on training set 32.81 %, accuracy on test set 29.91 %
step 7000 : loss is 002.07 , accuracy on training set 28.12 %, accuracy on test set 30.49 %
In [67]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation -

  1. The Adagrad Optimizer performed better than the Adam Optimizer , but not as well as the Gradient Descent Optimizer atleast for 7000 epochs.
  2. The Testing and Training accuracies for the Adagrad optimizer did improve consistently through all 7000 epochs. Hence, though the AdaGrad optimizer did not reach the desirable accuracy within 7000 epochs , it would be interesting to see if it can surpass the accuracy provided by Gradient Descent Optimizer.
  3. As mentioned above that the CIFAR 10 Dataset takes about 150,000 iterations to train, the Adagrad Optimizer could be a promising optimizer to tune.

TRain Accuracy= 36% Test Accuracy= 28%

Final Result for Gradient Estimation Tuning

Though the Stochastic Gradient Descent Optimizer perfoemd the best , Adagrad Optimizer could be a promising parametr to tune. Additonally the Adam Optimizer did not seem like a desirable gradient Estimation value to tune

Hyper Parameter Tuning thr Network Architecture for CNN

Network Architecture

Change Connection Type

LENET LIKE

I have tuned parameters to change the connection type. In this case , there is a change in pooling . Average Pooling has been replaced by Max Pooling. Also incorporated dropouts which would let go of 50% of neurons from the flatten and fully connected layers.

There is a slight diffrence the weights and bias that have been assigned for the model. We have incorporated drop outs in the layers as well. Follow the comments to observe the change

In [68]:
LENET5_LIKE_BATCH_SIZE = 32
LENET5_LIKE_FILTER_SIZE = 5
LENET5_LIKE_FILTER_DEPTH = 16
LENET5_LIKE_NUM_HIDDEN = 120
 
# Create the function as before , rename it for convenience. The weights and bias remain the same    
def variables_lenet5_like(filter_size = LENET5_LIKE_FILTER_SIZE, 
                          filter_depth = LENET5_LIKE_FILTER_DEPTH, 
                          num_hidden = LENET5_LIKE_NUM_HIDDEN,
                          image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
 
    w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth]))
 
    w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth, filter_depth], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth]))
 
    w3 = tf.Variable(tf.truncated_normal([(image_width // 4)*(image_width // 4)*filter_depth , num_hidden], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden]))
 
    w4 = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden]))
 
    w5 = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
                  'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
                  'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
                }
    return variables
 
# Here average ooling has been changed to max pooling.
def model_lenet5_like(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.nn.relu(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.max_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
 
    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='SAME')
    layer2_actv = tf.nn.relu(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.max_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    
# Introduced drop outs , ie in different iterations 50% of the neurons will dropped out 
#from the flat layer and the fully connected layer 
    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.relu(layer3_fccd)
    layer3_drop = tf.nn.dropout(layer3_actv, 0.5)
 
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.relu(layer4_fccd)
    layer4_drop = tf.nn.dropout(layer4_actv, 0.5)
 
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

HYper Parameters for tuning the model remain the same. This code can be reused from the inital model.

In [69]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5_like(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5_like
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.AdagradOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code to start the tensorflow session and use the code from the inital model

In [70]:
train=[]
test=[]
display=[]
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is 095.76 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 0200 : loss is 002.31 , accuracy on training set 14.06 %, accuracy on test set 10.01 %
step 0400 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0600 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0800 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 9.99 %
step 1000 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 9.99 %
step 1200 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1400 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 1600 : loss is 002.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1800 : loss is 002.31 , accuracy on training set 17.19 %, accuracy on test set 10.00 %
step 2000 : loss is 002.30 , accuracy on training set 14.06 %, accuracy on test set 9.99 %
step 2200 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 2400 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 2600 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 2800 : loss is 002.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 3000 : loss is 002.30 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 3200 : loss is 002.31 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 3400 : loss is 002.30 , accuracy on training set 3.12 %, accuracy on test set 10.00 %
step 3600 : loss is 002.30 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 3800 : loss is 002.31 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 4000 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 4200 : loss is 002.30 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 4400 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 4600 : loss is 002.29 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 4800 : loss is 002.31 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 5000 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 5200 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 5400 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 5600 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 5800 : loss is 002.30 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 6000 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 6200 : loss is 002.30 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 6400 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 6600 : loss is 002.30 , accuracy on training set 6.25 %, accuracy on test set 10.00 %
step 6800 : loss is 002.31 , accuracy on training set 12.50 %, accuracy on test set 10.00 %
step 7000 : loss is 002.30 , accuracy on training set 7.81 %, accuracy on test set 10.00 %

Plot the graph as the initla model

In [71]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation

  1. It is observed that modifying the pooling layer did not help the Network to improve accuarcy.
  2. Also adding the decay rate to the network did not help the Neural Network.
  3. The testing accuracy did not improve and was constant through 7000 epochs.

Hence, by using 7000 epochs as a bench marck this network architecture did not improve in given time span. Additionally, the network would have to be run for a longer time to observe if the network would plateau. But observing the accuracies this would definetly lower the chances of the network to plateau

Train Accuracy= 17% Test Accuracy=10%

Part F

Network Architecture

Number of Layers

In order to understand the imporatnce of the convolution layers, I eliminated the Convolution layers and incorprated only the flattened layer and the fully connected layer. Fully Connected Layers - This neural network is connected as fully connected layers. Hence, only the flattened layer has been incorporated. This implies that there is only the fully connected layer.

Lets observe the structure as below. Follow the comments in the comments to observe changes

In [76]:
train=[]
test=[]
display=[]
import tensorflow as tf
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels 
 
#the dataset
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10
 
#number of iterations and learning rate
num_steps = 7001
display_step = 100
learning_rate = 0.5
batch_size=64
 
graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth),name='tf_train_dataset')
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)
  
    #2) Then, the weight matrices and bias vectors are initialized
    #as a default, tf.truncated_normal() is used for the weight matrix and tf.zeros() is used for the bias vector.
    weights = tf.Variable(tf.truncated_normal([image_width * image_height * image_depth, num_labels]), tf.float32)
    bias = tf.Variable(tf.zeros([num_labels]), tf.float32)
  
    #3) define the model:
    #A one layered fccd simply consists of a matrix multiplication
    # Got rid of the COnv2D Layers and only incorporated the flattened layer
    def model(data, weights, bias):
        return tf.matmul(flatten_tf_array(data), weights) + bias
 
    logits = model(tf_train_dataset, weights, bias)
 
    #4) calculate the loss, which will be used in the optimization of the weights
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
 
    #5) Choose an optimizer. Many are available.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimbbize(loss)
 
    #6) The predicted values for the images in the train dataset and test dataset are assigned to the variables train_prediction and test_prediction. 
    #It is only necessary if you want to know the accuracy by comparing it with the actual values. 
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, weights, bias))
 
 
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction],feed_dict=feed_dict)
        if (step % display_step == 0):
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized
step 0000 : loss is 7102.77 , accuracy on training set 14.06 %, accuracy on test set 10.00 %
step 0100 : loss is 4962667.50 , accuracy on training set 17.19 %, accuracy on test set 15.19 %
step 0200 : loss is 4915249.00 , accuracy on training set 29.69 %, accuracy on test set 20.04 %
step 0300 : loss is 2236246.25 , accuracy on training set 21.88 %, accuracy on test set 24.94 %
step 0400 : loss is 3146381.00 , accuracy on training set 12.50 %, accuracy on test set 21.21 %
step 0500 : loss is 2106159.50 , accuracy on training set 17.19 %, accuracy on test set 25.28 %
step 0600 : loss is 5896779.00 , accuracy on training set 14.06 %, accuracy on test set 18.94 %
step 0700 : loss is 3414244.00 , accuracy on training set 17.19 %, accuracy on test set 29.97 %
step 0800 : loss is 4998041.00 , accuracy on training set 14.06 %, accuracy on test set 22.98 %
step 0900 : loss is 3547517.75 , accuracy on training set 21.88 %, accuracy on test set 21.40 %
step 1000 : loss is 3409603.00 , accuracy on training set 31.25 %, accuracy on test set 23.69 %
step 1100 : loss is 4235497.50 , accuracy on training set 20.31 %, accuracy on test set 17.09 %
step 1200 : loss is 2827261.00 , accuracy on training set 28.12 %, accuracy on test set 21.36 %
step 1300 : loss is 4593059.00 , accuracy on training set 21.88 %, accuracy on test set 21.97 %
step 1400 : loss is 4363544.50 , accuracy on training set 26.56 %, accuracy on test set 24.29 %
step 1500 : loss is 3314686.25 , accuracy on training set 29.69 %, accuracy on test set 24.91 %
step 1600 : loss is 4803553.00 , accuracy on training set 23.44 %, accuracy on test set 18.79 %
step 1700 : loss is 3399794.50 , accuracy on training set 29.69 %, accuracy on test set 20.70 %
step 1800 : loss is 2805962.25 , accuracy on training set 12.50 %, accuracy on test set 24.26 %
step 1900 : loss is 3440516.00 , accuracy on training set 25.00 %, accuracy on test set 20.79 %
step 2000 : loss is 4189881.75 , accuracy on training set 21.88 %, accuracy on test set 22.17 %
step 2100 : loss is 3005617.50 , accuracy on training set 29.69 %, accuracy on test set 24.31 %
step 2200 : loss is 2296065.75 , accuracy on training set 26.56 %, accuracy on test set 24.72 %
step 2300 : loss is 5144552.00 , accuracy on training set 23.44 %, accuracy on test set 19.85 %
step 2400 : loss is 2852842.00 , accuracy on training set 26.56 %, accuracy on test set 19.24 %
step 2500 : loss is 5600452.00 , accuracy on training set 17.19 %, accuracy on test set 27.56 %
step 2600 : loss is 2736806.00 , accuracy on training set 21.88 %, accuracy on test set 27.47 %
step 2700 : loss is 2636084.00 , accuracy on training set 31.25 %, accuracy on test set 27.24 %
step 2800 : loss is 1477021.25 , accuracy on training set 39.06 %, accuracy on test set 22.48 %
step 2900 : loss is 3261680.50 , accuracy on training set 28.12 %, accuracy on test set 27.42 %
step 3000 : loss is 3873634.75 , accuracy on training set 18.75 %, accuracy on test set 24.59 %
step 3100 : loss is 2186226.00 , accuracy on training set 37.50 %, accuracy on test set 26.04 %
step 3200 : loss is 4123630.25 , accuracy on training set 25.00 %, accuracy on test set 28.04 %
step 3300 : loss is 2420481.50 , accuracy on training set 26.56 %, accuracy on test set 26.17 %
step 3400 : loss is 1945761.00 , accuracy on training set 31.25 %, accuracy on test set 22.82 %
step 3500 : loss is 2199370.75 , accuracy on training set 23.44 %, accuracy on test set 30.45 %
step 3600 : loss is 4559873.00 , accuracy on training set 15.62 %, accuracy on test set 26.14 %
step 3700 : loss is 2621394.00 , accuracy on training set 26.56 %, accuracy on test set 25.41 %
step 3800 : loss is 3928716.00 , accuracy on training set 18.75 %, accuracy on test set 21.29 %
step 3900 : loss is 4624236.00 , accuracy on training set 20.31 %, accuracy on test set 26.10 %
step 4000 : loss is 2319666.50 , accuracy on training set 25.00 %, accuracy on test set 29.45 %
step 4100 : loss is 2587915.50 , accuracy on training set 32.81 %, accuracy on test set 28.60 %
step 4200 : loss is 4441211.00 , accuracy on training set 21.88 %, accuracy on test set 25.63 %
step 4300 : loss is 3468578.50 , accuracy on training set 10.94 %, accuracy on test set 28.41 %
step 4400 : loss is 3632330.75 , accuracy on training set 20.31 %, accuracy on test set 25.06 %
step 4500 : loss is 3249917.50 , accuracy on training set 28.12 %, accuracy on test set 19.02 %
step 4600 : loss is 3606224.00 , accuracy on training set 35.94 %, accuracy on test set 27.32 %
step 4700 : loss is 2739944.25 , accuracy on training set 28.12 %, accuracy on test set 27.00 %
step 4800 : loss is 2942256.50 , accuracy on training set 32.81 %, accuracy on test set 21.56 %
step 4900 : loss is 2994335.25 , accuracy on training set 28.12 %, accuracy on test set 23.86 %
step 5000 : loss is 4587277.00 , accuracy on training set 26.56 %, accuracy on test set 27.29 %
step 5100 : loss is 2730795.50 , accuracy on training set 21.88 %, accuracy on test set 29.23 %
step 5200 : loss is 2696642.00 , accuracy on training set 31.25 %, accuracy on test set 23.56 %
step 5300 : loss is 3376483.50 , accuracy on training set 26.56 %, accuracy on test set 20.09 %
step 5400 : loss is 1932779.75 , accuracy on training set 34.38 %, accuracy on test set 29.97 %
step 5500 : loss is 1542333.75 , accuracy on training set 31.25 %, accuracy on test set 28.32 %
step 5600 : loss is 3100080.00 , accuracy on training set 23.44 %, accuracy on test set 20.52 %
step 5700 : loss is 3047641.25 , accuracy on training set 25.00 %, accuracy on test set 22.13 %
step 5800 : loss is 3610154.75 , accuracy on training set 26.56 %, accuracy on test set 23.52 %
step 5900 : loss is 3036787.00 , accuracy on training set 23.44 %, accuracy on test set 25.93 %
step 6000 : loss is 2121416.50 , accuracy on training set 23.44 %, accuracy on test set 21.77 %
step 6100 : loss is 4837542.50 , accuracy on training set 12.50 %, accuracy on test set 22.74 %
step 6200 : loss is 3673563.50 , accuracy on training set 26.56 %, accuracy on test set 23.34 %
step 6300 : loss is 3296692.00 , accuracy on training set 21.88 %, accuracy on test set 30.63 %
step 6400 : loss is 3846570.00 , accuracy on training set 31.25 %, accuracy on test set 30.88 %
step 6500 : loss is 3512010.00 , accuracy on training set 29.69 %, accuracy on test set 26.06 %
step 6600 : loss is 2930426.75 , accuracy on training set 34.38 %, accuracy on test set 28.62 %
step 6700 : loss is 3001428.00 , accuracy on training set 26.56 %, accuracy on test set 21.85 %
step 6800 : loss is 2275056.75 , accuracy on training set 32.81 %, accuracy on test set 27.74 %
step 6900 : loss is 2568419.00 , accuracy on training set 12.50 %, accuracy on test set 24.14 %
step 7000 : loss is 3217212.25 , accuracy on training set 17.19 %, accuracy on test set 21.96 %

plot the graph as done in the previous examples

In [77]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation

  1. It is clearly observed that using the fully connected layers cause computation to be faster. 7000 epochs were executed very fast.
  2. Though this is said, the accuracy did not improve. Comparing the accuracy reached in 7000 epochs was very less compared to the presence of the CNN.

Hence it is observed that using fully connected layers improved computation speed but did not improve the accuracy significantly. Hence, proving the imporatance of the Convolutional Layers.

Trainig Accuracy =39% (but no stabily and drastic rise and fall) Testing Accuracy= 32% (no stability)

Final Observation

It is observed that the Network Architecture plays an important role in the network performance. Other prospective parameters to change would include number of convolutional Layers, changing the pooling function with the activation function to name a few.

Hyper Parameter Tuning the Network Intialization for CNN

Network Initialization

For Network Initialization, I have used the Xavier Network Initialization and the random_normal initialization which is a kind of Gaussian Initialization.

Lets first have a look at Xavier Initialization.

Xavier Intialization

It helps signals reach deep into the network.

  1. If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
  2. If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.

Xavier initialization makes sure the weights are ‘just right’, keeping the signal in a reasonable range of values through many layers.

Implemetation of Tensorflow code

Lets see how to implement the Xavier Initialization

Reuse the code from the previous model and weight initialization and assigning it to the layers. Incorporate an additional parameter 'initializer' in the weights and bias for the variables_lenet5 function . Follow the comments below to see the changes made.

In [28]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
# add an initiaizer parameter to each of theb weights. Assign it as tf.contrib.layers.xavier_initializer()

    w1 = tf.get_variable("w1",[filter_size, filter_size, image_depth, filter_depth1],initializer=tf.contrib.layers.xavier_initializer())
#     w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.get_variable("w2",[filter_size, filter_size, filter_depth1, filter_depth2],initializer=tf.contrib.layers.xavier_initializer())
#     w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.get_variable("w3",[(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1],initializer=tf.contrib.layers.xavier_initializer())
#     w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.get_variable("w4",[num_hidden1, num_hidden2],initializer=tf.contrib.layers.xavier_initializer())
#     w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.get_variable("w5",[num_hidden2, num_labels],initializer=tf.contrib.layers.xavier_initializer())
#     w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Reuse the code for the hyper parameter initialization

In [31]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code to run the tensorflow session from the inital model

In [32]:
### running the tensorflow session
train=[]
test=[]
display=[]
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is 002.99 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 0200 : loss is 002.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0400 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0600 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0800 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1000 : loss is 002.30 , accuracy on training set 4.69 %, accuracy on test set 10.00 %
step 1200 : loss is 002.31 , accuracy on training set 0.00 %, accuracy on test set 10.00 %
step 1400 : loss is 002.30 , accuracy on training set 3.12 %, accuracy on test set 10.00 %
step 1600 : loss is 002.28 , accuracy on training set 18.75 %, accuracy on test set 13.65 %
step 1800 : loss is 002.28 , accuracy on training set 15.62 %, accuracy on test set 10.99 %
step 2000 : loss is 002.22 , accuracy on training set 21.88 %, accuracy on test set 15.37 %
step 2200 : loss is 002.22 , accuracy on training set 15.62 %, accuracy on test set 17.01 %
step 2400 : loss is 002.20 , accuracy on training set 17.19 %, accuracy on test set 16.34 %
step 2600 : loss is 002.21 , accuracy on training set 14.06 %, accuracy on test set 17.16 %
step 2800 : loss is 002.27 , accuracy on training set 14.06 %, accuracy on test set 16.06 %
step 3000 : loss is 002.10 , accuracy on training set 17.19 %, accuracy on test set 18.67 %
step 3200 : loss is 002.22 , accuracy on training set 15.62 %, accuracy on test set 22.87 %
step 3400 : loss is 002.21 , accuracy on training set 18.75 %, accuracy on test set 24.26 %
step 3600 : loss is 002.13 , accuracy on training set 29.69 %, accuracy on test set 23.50 %
step 3800 : loss is 002.13 , accuracy on training set 20.31 %, accuracy on test set 24.86 %
step 4000 : loss is 002.13 , accuracy on training set 23.44 %, accuracy on test set 27.47 %
step 4200 : loss is 001.87 , accuracy on training set 31.25 %, accuracy on test set 28.81 %
step 4400 : loss is 001.89 , accuracy on training set 35.94 %, accuracy on test set 29.58 %
step 4600 : loss is 001.89 , accuracy on training set 28.12 %, accuracy on test set 29.30 %
step 4800 : loss is 001.88 , accuracy on training set 31.25 %, accuracy on test set 30.16 %
step 5000 : loss is 002.04 , accuracy on training set 28.12 %, accuracy on test set 32.92 %
step 5200 : loss is 001.92 , accuracy on training set 34.38 %, accuracy on test set 32.69 %
step 5400 : loss is 001.76 , accuracy on training set 35.94 %, accuracy on test set 33.79 %
step 5600 : loss is 001.86 , accuracy on training set 31.25 %, accuracy on test set 34.48 %
step 5800 : loss is 001.73 , accuracy on training set 31.25 %, accuracy on test set 34.32 %
step 6000 : loss is 001.95 , accuracy on training set 31.25 %, accuracy on test set 35.24 %
step 6200 : loss is 001.90 , accuracy on training set 32.81 %, accuracy on test set 36.90 %
step 6400 : loss is 001.98 , accuracy on training set 29.69 %, accuracy on test set 36.40 %
step 6600 : loss is 001.65 , accuracy on training set 46.88 %, accuracy on test set 37.12 %
step 6800 : loss is 002.22 , accuracy on training set 25.00 %, accuracy on test set 37.46 %
step 7000 : loss is 001.76 , accuracy on training set 40.62 %, accuracy on test set 34.34 %
In [9]:
##3 Plot the accuracy vs Epochs sae as the inital model
In [34]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation with network initialization :

It is observed that the Xavier Initialization definitely helped, as the testing and training accuracy are improved steadily with an almost constant decrease in Loss. Unlike earlier cases where the network woould get lost in various loss parts of the loss senario. It reached a similar accuracy of the benchmark 7000 epochs , where the training accuracy almost reached 40%.

Using this Weight initialization there is a chance of improving accuracy with increased number of epochs. Hence, tuning the weight initilization helped the network. Using this weight initialization might also use less number of epochs for the network to plateau.

Network Initialization fro CNN

Random Normal Initialization - Gaussian

Lets try the Gaussian Inialization to observe if there is a change in the accuracy of the model.

Tensorflow implemetation code

Reuse the code from the initla modelfor initializing the eights and bias. change the initializer from truncated_normal to random_normal. Random_Normal is a gaussain form of initialization. Follow the commenst to view changes in the tensorflow code

In [47]:
import tensorflow as tf

LENET5_BATCH_SIZE = 32
LENET5_FILTER_SIZE = 5
LENET5_FILTER_DEPTH_1 = 6
LENET5_FILTER_DEPTH_2 = 16
LENET5_NUM_HIDDEN_1 = 120
LENET5_NUM_HIDDEN_2 = 84

### Designing the weights and biases for the network
def variables_lenet5(filter_size = LENET5_FILTER_SIZE, filter_depth1 = LENET5_FILTER_DEPTH_1, 
                     filter_depth2 = LENET5_FILTER_DEPTH_2, 
                     num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,
                     image_width = 28, image_height = 28, image_depth = 1, num_labels = 10):
    
# Each of weights set to tf.random_normal   

    w1 = tf.Variable(tf.random_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
#     w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([filter_depth1]))

    w2 = tf.Variable(tf.random_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
#     w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth1, filter_depth2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth2]))

    w3 = tf.Variable(tf.random_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
#     w3 = tf.Variable(tf.truncated_normal([(image_width // 5)*(image_height // 5)*filter_depth2, num_hidden1], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

    w4 = tf.Variable(tf.random_normal([num_hidden1, num_hidden2], stddev=0.1))
#     w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
    
    w5 = tf.Variable(tf.random_normal([num_hidden2, num_labels], stddev=0.1))
#     w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
    b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
    variables = {
        'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,
        'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5
    }
    return variables
### Setting up the layers and activation
def model_lenet5(data, variables):
    layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
    layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])
    layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')
    layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])
    layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

    flat_layer = flatten_tf_array(layer2_pool)
    layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']
    layer3_actv = tf.nn.sigmoid(layer3_fccd)
    
    layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']
    layer4_actv = tf.nn.sigmoid(layer4_fccd)
    logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']
    return logits

Reuse the code for initlaizing the hyper parameter from the intial model

In [48]:
#parameters determining the model size
image_width = c10_image_width
image_height = c10_image_height
image_depth = c10_image_depth
num_labels = c10_num_labels

#the datasets
train_dataset = train_dataset_cifar10
train_labels = train_labels_cifar10 
test_dataset = test_dataset_cifar10
test_labels = test_labels_cifar10

#number of iterations and learning rate
num_steps = 7001
display_step = 200
learning_rate = 0.5
batch_size=64

graph = tf.Graph()
with graph.as_default():
    #1) First we put the input data in a tensorflow friendly form. 
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    tf_test_dataset = tf.constant(test_dataset, tf.float32)

    #2) Then, the weight matrices and bias vectors are initialized
    variables = variables_lenet5(image_width = image_width, image_height=image_height, image_depth = image_depth, num_labels = num_labels)

    #3. The model used to calculate the logits (predicted labels)
    model = model_lenet5
    logits = model(tf_train_dataset, variables)

    #4. then we compute the softmax cross entropy between the logits and the (actual) labels
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    
    #5. The optimizer is used to calculate the gradients of the loss function 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

Reuse the code for initializing the tensorflow session

In [49]:
### running the tensorflow session
train=[]
test=[]
display=[]
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized with learning_rate', learning_rate)
    for step in range(num_steps):
 
        #Since we are using stochastic gradient descent, we are selecting  small batches from the training dataset,
        #and training the convolutional neural network each time with a batch. 
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if step % display_step == 0:
            train_accuracy = accuracy(predictions, batch_labels)
            train.append(train_accuracy)
            test_accuracy = accuracy(test_prediction.eval(), test_labels)
            test.append(test_accuracy)
            display.append(step)
            message = "step {:04d} : loss is {:06.2f} , accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
            print(message)
Initialized with learning_rate 0.5
step 0000 : loss is 002.45 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0200 : loss is 002.31 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 0400 : loss is 002.31 , accuracy on training set 7.81 %, accuracy on test set 10.00 %
step 0600 : loss is 002.30 , accuracy on training set 10.94 %, accuracy on test set 10.00 %
step 0800 : loss is 002.30 , accuracy on training set 9.38 %, accuracy on test set 10.00 %
step 1000 : loss is 002.20 , accuracy on training set 12.50 %, accuracy on test set 15.03 %
step 1200 : loss is 002.18 , accuracy on training set 20.31 %, accuracy on test set 16.96 %
step 1400 : loss is 002.14 , accuracy on training set 17.19 %, accuracy on test set 20.74 %
step 1600 : loss is 002.01 , accuracy on training set 28.12 %, accuracy on test set 24.35 %
step 1800 : loss is 002.08 , accuracy on training set 26.56 %, accuracy on test set 25.52 %
step 2000 : loss is 001.87 , accuracy on training set 39.06 %, accuracy on test set 28.53 %
step 2200 : loss is 001.98 , accuracy on training set 20.31 %, accuracy on test set 27.40 %
step 2400 : loss is 001.90 , accuracy on training set 18.75 %, accuracy on test set 28.52 %
step 2600 : loss is 001.67 , accuracy on training set 37.50 %, accuracy on test set 31.25 %
step 2800 : loss is 001.75 , accuracy on training set 31.25 %, accuracy on test set 31.70 %
step 3000 : loss is 001.90 , accuracy on training set 28.12 %, accuracy on test set 34.58 %
step 3200 : loss is 002.12 , accuracy on training set 26.56 %, accuracy on test set 27.44 %
step 3400 : loss is 001.68 , accuracy on training set 45.31 %, accuracy on test set 30.65 %
step 3600 : loss is 001.91 , accuracy on training set 28.12 %, accuracy on test set 33.09 %
step 3800 : loss is 001.88 , accuracy on training set 34.38 %, accuracy on test set 37.77 %
step 4000 : loss is 002.06 , accuracy on training set 25.00 %, accuracy on test set 32.95 %
step 4200 : loss is 001.82 , accuracy on training set 34.38 %, accuracy on test set 36.95 %
step 4400 : loss is 001.73 , accuracy on training set 35.94 %, accuracy on test set 38.04 %
step 4600 : loss is 001.71 , accuracy on training set 39.06 %, accuracy on test set 36.48 %
step 4800 : loss is 001.80 , accuracy on training set 32.81 %, accuracy on test set 38.54 %
step 5000 : loss is 001.85 , accuracy on training set 34.38 %, accuracy on test set 39.55 %
step 5200 : loss is 001.76 , accuracy on training set 31.25 %, accuracy on test set 35.97 %
step 5400 : loss is 001.51 , accuracy on training set 40.62 %, accuracy on test set 39.46 %
step 5600 : loss is 001.81 , accuracy on training set 25.00 %, accuracy on test set 38.02 %
step 5800 : loss is 001.49 , accuracy on training set 45.31 %, accuracy on test set 39.78 %
step 6000 : loss is 001.77 , accuracy on training set 32.81 %, accuracy on test set 41.39 %
step 6200 : loss is 001.74 , accuracy on training set 39.06 %, accuracy on test set 42.23 %
step 6400 : loss is 001.63 , accuracy on training set 40.62 %, accuracy on test set 42.68 %
step 6600 : loss is 001.50 , accuracy on training set 45.31 %, accuracy on test set 42.35 %
step 6800 : loss is 002.10 , accuracy on training set 23.44 %, accuracy on test set 42.18 %
step 7000 : loss is 001.65 , accuracy on training set 42.19 %, accuracy on test set 39.52 %

Plott he graph as done earlier

In [50]:
# Graph to see the network plateau
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
plt.plot(display,test,label='validation')
plt.plot(display,train,label='training')
plt.legend(loc=0)
plt.xlabel('epochs')
plt.ylabel('accuracy')
# plt.xlim([1,display_step])
# plt.xlim(display)
#     plt.ylim([0,1])
plt.grid(True)
plt.title("Model Accuracy")
plt.show()
#     fig.savefig('img/'+str(i)+'-accuracy.jpg')
plt.close(fig)

Observation

  1. There is a clear improvement in the performance of the network as the training and testing accuracy has reached around 40% of accuracy in 7000 epochs.
  2. The Performance is better than the Xavier Optimization as well. This is because even the Testing accuracy reached around 40 %
  3. This network intialization would defintely help the network plateau better.
  4. The weight initialization also allows a steady change in the training and testing accuracy

Train Accuracy = 45% Test Accuracy= 42%

Final Observation for Netwrok Initialization

Therefore Network Initialization Xavier and Random_Normal Initalization will contribute to the the network plateauing consistenty. This an imporatnt parameter to look at while tuning a CNN Model, especially to bring consistency between the train and testing results and steady the decrease in the Losses.

Final Result for Hyper Parameter Tuning the CNN Model

After performing the various parameter tuning the following is observed

  1. The network currently works well with a learning rate of 0.5. Though it might make sense to try the a combination of learning rate and a diffrent activation function except sigmoid as there was no change with the Sigmoid Activation and learning rate.

  2. Activation Function : Currently the use of the Sigmoid Activation function works best and appears to rpovide the highest accuracy in 7000 epochs

  3. Loss :The Hinge Loss with activation function Tanh worked well and could be replaced by the Softmax function. But this would be a comparitive study to take up with the number of epochs

  4. Number of Epochs : As mentioned earlier, the network must be trained for atleast 150000 epochs. There is a clear increase in accuracy as the number of the epochs increases

  5. Gradient Estimation : The gradient descent Optimizer performs the best and contributes to the plateuing of the network considerable

  6. Network Initialization : played a major role in the network plateauing and the Xavier and Random_normal weight initlization must be run for a prolonged number of epochs to perform better , but as of now the Gaussain initialization outperformed the xavier intialization.

Summary

image.png

Conclusion

The Loss Function, Hinge Loss, Number of Epochs and the Network Intialization (Xavier and Gaussian) playeed the most important role to improve training and testing accuracy and can be run for larger epochs to see its effects on the CNN. As per benchmark the above mentioned parameters outperformed the others clearly

Recurrent Neural Networks

Recurrent Neural Networks is a class of Neural Networks where connections between units form a form a directed graph along a sequence. RNNs can use their internal state (memory) to process sequences of inputs and hence they are popularly used for Handwriting Recognition. An RNN with a gated state or gated memory are part of Long Short Term Memory Neural Network.

LSTMs contain information outside the normal flow of the recurrent network in a gated cell. Information can be stored in, written to, or read from a cell, much like data in a computer’s memory. The cell makes decisions about what to store, and when to allow reads, writes and erasures, via gates that open and close. Unlike the digital storage on computers, however, these gates are analog, implemented with element-wise multiplication by sigmoids, which are all in the range of 0-1.

Dataset Description

HasyV2 is a dataset similar to the Hello World Dataset MNIST for Handwriting recognition.This is a dataset of single symbols similar to MNIST. It contains 32px x 32px images of 168233 instances of 369 classes. In total, the dataset contains over 150,000 instances of handwritten symbols. This is used as a Hello world dataset for Handwriting recognition.

Since, this dataset consists of large number of images that would require additional computation power and a considerable amount of time to train , I have used a subset of the dataset that has been created bu Sumit Kothari. He has created the subset and stored the images and labels in numpy arrays using a subset of the data. The data can be found at following link below.

https://github.com/sumit-kothari/AlphaNum-HASYv2

Directions to download the dataset

Steps to use the dataset

[1] Download the dataset from the link below :

https://github.com/sumit-kothari/AlphaNum-HASYv2

[2] Download and store the dataset in the Jupyter Notebook path , where all other ipynb notebooks exist. The file path to store in are as follows :

The Jupyter Notebook path would look as follows :

image.png

[3] Files to Download

alphanum-hasy-data-X.npy alphanum-hasy-data-y.npy symbols.csv

[4] This will ensure that we can directly access the datasets

Details of RNN - LSTM Structure Used

The RNN-LSTM structure utilizes weights and bias initialized as random normal. The simplest form of an RNN known as the Static RNN Cell and Basic LSTM Cell is being used. In the case ofthe HasyV2 dataset , the size of a image is 32X32 pixels. The RNN would compute through 32 rows of the image each time. The RNN would receive 32 time steps , where one row (32 pixels) would be input. This would result in a full image in 32 timesteps. A batch size for the number of images will be supplied, such that every time step would be would be supplied with the enumerated batch size of images.

The learning rate selected is 0.001 and a batch size of 128. The opitmizer to reduce the softmax_cross entropy loss is the Adam Optimizer that is selected.

Steps to implement the Tensorflow code for the RNN LSTM Structure

First lets import all necessary libraries

In [36]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import urllib
import requests
from bs4 import BeautifulSoup
from pandas import DataFrame
import zipfile,io,os
import pandas as pd
from random import randint
from sklearn.model_selection import train_test_split
from tensorflow.contrib import rnn
from keras.utils import np_utils
Using TensorFlow backend.

Description of the Files

The file alphanum-hasy-data-X.npy consists of images as a numpy array and alphanum-hasy-data-y.npy consists of the image labels in the form of a numpy array. The Symbols.csv consists of the list of symbols with the latex. The latex field contains the label of the image. The CSV file also contains the count of the training and testing sample for each symbol.

Preprocessing the data

We load each of the files in this step

In [37]:
# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced from https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data by Sumit Kothari 
#which is public in the below 3 cells 
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

Each symbol has a latex that signifies the character. In order to see which symbol id is associated with the each symbol the following function has been written

In [14]:
# Code referenced from https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data by Sumit Kothari which is public in the below 3 cells 
def symbol_id_to_symbol(symbol_id = None):
    #first we check if the symbol id exists, if it does not exist send a message , else provide its latex value to the symbol
    if symbol_id:
        symbol_data = SYMBOLS.loc[SYMBOLS['symbol_id'] == symbol_id]
        if not symbol_data.empty:
            return str(symbol_data["latex"].values[0])
        else:
            print("This should not have happend, wrong symbol_id = ", symbol_id)
            return None
    else: 
        print("This should not have happend, no symbol id passed")
        return None        

# test some values
print("21 = ", symbol_id_to_symbol(21))
print("32 = ", symbol_id_to_symbol(32))
print("90 = ", symbol_id_to_symbol(90))
print("115 = ", symbol_id_to_symbol(95))
This should not have happend, wrong symbol_id =  21
21 =  None
32 =  B
90 =  a
115 =  f

In the above cell it can be seen that symbol_id 21 has no latex. The Symbol_id 32 has the handwritten digit 'B'.

Next, we are going to plot a few of the images , whose symbols are generated by the numpy random integer function. We use the random integer function to randomly generate images. Then we use the symbol_id_to_symbol function map a symbol images to its label

In [5]:
#plot images from the dataset
# Code referenced from https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data by Sumit Kothari which is public in the below 3 cells 
f, ax = plt.subplots(2, 3, figsize=(12, 10))
ax_x = 0
ax_y = 0

# plot 6 random images from the dataset genrated by the randint library
for i in range(6):
    randKey = randint(0, X_load.shape[0])
    ax[ax_x, ax_y].imshow(X_load[randKey], cmap='gray')
    ax[ax_x, ax_y].title.set_text("Value : " + symbol_id_to_symbol(y_load[randKey]))

    # for proper subplots
    if ax_x == 1:
        ax_x = 0
        ax_y = ax_y + 1
    else:
        ax_x = ax_x + 1

After visualizing the data lets split the data into training and testing using the Scikit Learn Library

In [3]:
# Split the data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Shape of Trained Dataset")
print(X_train.shape, y_train.shape)
print("Shape of Trained Dataset")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)
In [38]:
# the cells below which represent the architecture and running of the network has code that has been referenced 
#from https://jasdeep06.github.io/posts/Understanding-LSTM-in-Tensorflow-MNIST/ by author jasdeep06.
#The code below is based on jasdeep06 (no licence mentioned )
# this code has been modified further to adapt to the HasyV2 dataset

Running the code in Tensorflow

Lets define the network parameters. Since the image size is 32X32 pixel, the time_steps=32 and the n_input =32. The total number of classes in the subset are 116

hence,

n_classes=116 Next, we pre process the data and converting the labels to one hot encoded values.

Next we initilize the weights and bias for the model, following which we initalize the network parameters and define the place holders. Follow the steps below to create the network structure in Tensorflow.

In [7]:
# Defining the network parameters
#define constants
#unrolled through 32 time steps
time_steps=32
#hidden LSTM units
num_units=128
#rows of 32 pixels
n_input=32
#learning rate for adam
learning_rate=0.001
#hasyv2 has 116 classes.
n_classes=116
#size of batch
batch_size=128

Since, the LSTM model would take one hot encoded values , we have normalized the images and one hot encoded the labels.

In [8]:
# Normalize the training set
# Code referenced from https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
#by Sumit Kothari which is public in the below 3 cells 
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs which are the labels
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

#Lets check the total number of labels in the dataset
num_classes = y_test.shape[1]
print("num_classes = ", num_classes)
num_classes =  116

It is seen that the total number of labels in the dataset is 116

Weight Initialization for the network is as done below using a Gaussian dirtibution. If Weights and Bias are not initialized it was observed that for this dataset the training would attain very good accuracies but the testing accuracies would be extremely low. Hence, weight initialization plays a very important role for the RNN LSTM model. Next, we define the place holders where the input has a shape of 32X32 and the output has a shape of 116

In [9]:
out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

Next, we process the input tensor which has a shape of (batch_size,n_steps,n_input) for the time_steps into a list using the unstack function. This will help in feeding the input as a list to the the static RNN cell.

In [10]:
#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

Next, we define the Basic LSTM Cell and the staic RNN cell

In [11]:
#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

We define the prediction as a matrix multiplication of the weights. The Loss function is softmax_cross_entropy_with_logits and using the Adam Optimizer to reduce the loss. We define the accuracy evaluation as well.

In [12]:
#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias
In [13]:
#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
WARNING:tensorflow:From <ipython-input-13-2663a3fbb76b>:2: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

We define the a function next_batch which is similar to 'mnist.train.next_batch()' that is commonly used hile training the MNIST Data. This function helps to iterate through the batches while training the data. The mnist.train.next_batch() function is specific to Tensorlfow and the MNIST Dataset. Hence, the following function will help provide batch of images while iterating through all images for the HASYV2 Dataset.

In [14]:
# The code has been referenced from 
#https://stackoverflow.com/questions/40994583/how-to-implement-tensorflows-next-batch-for-own-data?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa 
#by author @edo
import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

Next we initialize the tensorflow session and train the model

We test the model after the model has been trained for 800 epochs. We initialize lists namely train_loss, train_accuracy and epoch to store the values in the tensorflow session which will subsequently be used to to visualize the Loss and Accuracy vs Epochs. Follow the comments in the code to understand the steps implemeted in detail

In [18]:
# Store the loss, accuracy and epochs in a list to plot the network performance
# Number of epochs is selected as 800
train_loss=[]
train_accuracy=[]
epoch=[]
#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    iter=1
    while iter<800:
#provide the images in batches
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)
#reshape the training data for the tensor
        batch_x=batch_x.reshape((batch_size,time_steps,n_input))
# apply the optimizer to the training data
        sess.run(opt, feed_dict={x: batch_x, y: batch_y})
# whenever the number of epochs is completely divisible by 10, the training accuracy is printed with the loss and number of epochs
        if iter %10==0:
            epoch.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("__________________")

        iter=iter+1
#print the testing accuracy
    test_data = X_test.reshape((-1, time_steps, n_input))
# test_label = mnist.test.labels[:128]
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
For iter  10
Accuracy  0.015625
Loss  4.606448
__________________
For iter  20
Accuracy  0.109375
Loss  4.1117506
__________________
For iter  30
Accuracy  0.078125
Loss  4.120659
__________________
For iter  40
Accuracy  0.078125
Loss  3.8419223
__________________
For iter  50
Accuracy  0.1171875
Loss  3.7634513
__________________
For iter  60
Accuracy  0.1953125
Loss  3.4659755
__________________
For iter  70
Accuracy  0.1640625
Loss  3.3037777
__________________
For iter  80
Accuracy  0.265625
Loss  2.9401085
__________________
For iter  90
Accuracy  0.359375
Loss  2.7006998
__________________
For iter  100
Accuracy  0.3359375
Loss  2.5387964
__________________
For iter  110
Accuracy  0.34375
Loss  2.5352707
__________________
For iter  120
Accuracy  0.4765625
Loss  2.1649003
__________________
For iter  130
Accuracy  0.359375
Loss  2.3421955
__________________
For iter  140
Accuracy  0.3828125
Loss  2.1188374
__________________
For iter  150
Accuracy  0.46875
Loss  2.2092035
__________________
For iter  160
Accuracy  0.5234375
Loss  1.6888072
__________________
For iter  170
Accuracy  0.53125
Loss  1.5332057
__________________
For iter  180
Accuracy  0.4921875
Loss  1.7272668
__________________
For iter  190
Accuracy  0.6484375
Loss  1.3001859
__________________
For iter  200
Accuracy  0.5625
Loss  1.4282789
__________________
For iter  210
Accuracy  0.6171875
Loss  1.2590742
__________________
For iter  220
Accuracy  0.640625
Loss  1.1831118
__________________
For iter  230
Accuracy  0.625
Loss  1.3309809
__________________
For iter  240
Accuracy  0.59375
Loss  1.3895179
__________________
For iter  250
Accuracy  0.6015625
Loss  1.348432
__________________
For iter  260
Accuracy  0.65625
Loss  1.0107019
__________________
For iter  270
Accuracy  0.625
Loss  1.233178
__________________
For iter  280
Accuracy  0.640625
Loss  1.0330999
__________________
For iter  290
Accuracy  0.703125
Loss  1.0316291
__________________
For iter  300
Accuracy  0.640625
Loss  1.0154957
__________________
For iter  310
Accuracy  0.7265625
Loss  0.97535133
__________________
For iter  320
Accuracy  0.734375
Loss  1.0459522
__________________
For iter  330
Accuracy  0.65625
Loss  1.0245804
__________________
For iter  340
Accuracy  0.59375
Loss  1.2857254
__________________
For iter  350
Accuracy  0.6328125
Loss  1.1288687
__________________
For iter  360
Accuracy  0.71875
Loss  0.96261907
__________________
For iter  370
Accuracy  0.7890625
Loss  0.8203682
__________________
For iter  380
Accuracy  0.7890625
Loss  0.7034595
__________________
For iter  390
Accuracy  0.7890625
Loss  0.6799098
__________________
For iter  400
Accuracy  0.734375
Loss  0.86777806
__________________
For iter  410
Accuracy  0.7734375
Loss  0.65008587
__________________
For iter  420
Accuracy  0.7890625
Loss  0.56625575
__________________
For iter  430
Accuracy  0.7421875
Loss  0.6156035
__________________
For iter  440
Accuracy  0.8125
Loss  0.6263678
__________________
For iter  450
Accuracy  0.796875
Loss  0.6478534
__________________
For iter  460
Accuracy  0.7421875
Loss  0.6941583
__________________
For iter  470
Accuracy  0.8203125
Loss  0.57836235
__________________
For iter  480
Accuracy  0.8359375
Loss  0.6175069
__________________
For iter  490
Accuracy  0.765625
Loss  0.7502252
__________________
For iter  500
Accuracy  0.7890625
Loss  0.5786598
__________________
For iter  510
Accuracy  0.8125
Loss  0.525266
__________________
For iter  520
Accuracy  0.8046875
Loss  0.5278046
__________________
For iter  530
Accuracy  0.8515625
Loss  0.47948524
__________________
For iter  540
Accuracy  0.8359375
Loss  0.46751928
__________________
For iter  550
Accuracy  0.78125
Loss  0.6319556
__________________
For iter  560
Accuracy  0.8203125
Loss  0.53351176
__________________
For iter  570
Accuracy  0.84375
Loss  0.42849228
__________________
For iter  580
Accuracy  0.859375
Loss  0.44223437
__________________
For iter  590
Accuracy  0.7890625
Loss  0.46571708
__________________
For iter  600
Accuracy  0.8203125
Loss  0.46818626
__________________
For iter  610
Accuracy  0.84375
Loss  0.42172128
__________________
For iter  620
Accuracy  0.8671875
Loss  0.37101144
__________________
For iter  630
Accuracy  0.859375
Loss  0.36509073
__________________
For iter  640
Accuracy  0.8515625
Loss  0.50779045
__________________
For iter  650
Accuracy  0.8515625
Loss  0.467653
__________________
For iter  660
Accuracy  0.890625
Loss  0.3344072
__________________
For iter  670
Accuracy  0.8046875
Loss  0.49784464
__________________
For iter  680
Accuracy  0.9140625
Loss  0.31700146
__________________
For iter  690
Accuracy  0.8359375
Loss  0.3969943
__________________
For iter  700
Accuracy  0.8671875
Loss  0.36839885
__________________
For iter  710
Accuracy  0.890625
Loss  0.2918483
__________________
For iter  720
Accuracy  0.8984375
Loss  0.3600924
__________________
For iter  730
Accuracy  0.84375
Loss  0.39472553
__________________
For iter  740
Accuracy  0.875
Loss  0.342823
__________________
For iter  750
Accuracy  0.921875
Loss  0.27532077
__________________
For iter  760
Accuracy  0.90625
Loss  0.33015752
__________________
For iter  770
Accuracy  0.9140625
Loss  0.26719558
__________________
For iter  780
Accuracy  0.8984375
Loss  0.3071798
__________________
For iter  790
Accuracy  0.9453125
Loss  0.20781943
__________________
Testing Accuracy: 0.68597996

Lets plot the training ccuracy by number of epochs and the Loss vs number of epochs

In [21]:
# plot train loss vs epoch
plt.figure(figsize=(18, 5))
plt.subplot(1, 2, 1)
plt.title('Train Loss vs Epoch', fontsize=15)
plt.plot(epoch, train_loss, 'r-')
plt.xlabel('Epoch')
plt.ylabel('Train Loss')

# plot train accuracy vs epoch
plt.subplot(1, 2, 2)
plt.title('Train Accuracy vs Epoch', fontsize=15)
plt.plot(epoch, train_accuracy, 'b-')
plt.xlabel('Epoch')
plt.ylabel('Train Accuracy')
plt.show()

Observation of the Initial Model

We observe that Training Accuracy reached 94.53% and the Testing Accuracy reaches 68.59%. From the above graphs we understand that the loss decreases at every 100 epochs and the accuarcy increases consistently every 100 epochs. The Network performace is overall good for the above parameters taken into consideration. It is was also observed that without weights and bias the training accuracy was very good but the testing accuracy was very low. This indicates that Network Initalization plays an important role in a RNN LSTM .

Training Accuracy = 94.53% Testing Accuracy=68.59%

Lets try and improve the model by tuning various hyper parameters. The following hyper parametrs will be tuned number of epochs,batch_size,number of neurons, combination of learning rate and number of neurons, optimizer and activation functions

Hyper Parameter Tuning for RNN

With the help of the LSTM Model we have achieved a training accuracy of 94.53% and a Testing accuracy of 68.57%. Next we will tune the following hyper parameters and observe the impact of each of these on the model.

Hyper Parameter Tuning Number of Epochs for RNN Model

We will tune the model with Number of Epochs - 500,1000,200. To do this the following step were implemented

  1. We first reset the Tensor flow graph in order to reset all variables.
  2. We reuse the code to load the data as from the Initial Model
  3. We split the data into training and testing as done before in the initla model
  4. Initilaize various hyper parameters such as optimizer, cost and loss functions
  5. Next we create a list of epochs so the tensorflow senssion can loop through the number of epcohs in the list and provide various accuracies.

Lets look at the code steps by step

Hyper Parameter Tuning for RNN : Number of Epochs

Number of Epochs selected to tune are 500,1000,2000 epochs

We first reset the Tensor flow graph in order to reset all variables in the graph.

In [2]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

We reuse the code to load the data as from the Initial Model

In [6]:
# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

We split the data into training and testing as done before in the initla model

In [7]:
#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Let create functions plot_loss_epoch which will provide the loss vs number of epochs completed. We also create a similar function to plot accuracy.

In [42]:
#Functions to plot loss and accuracy vs number of epochs
# plot train loss vs epoch
def plot_loss_epoch():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Train Loss vs Epoch', fontsize=15)
#the list epoch_list and train_loss are initilazed in the session
    plt.plot(epoch_list, train_loss, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Train Loss')

# plot train accuracy vs epoch
def plot_acc_epoch():
    plt.subplot(1, 2, 2)
    plt.title('Train Accuracy vs Epoch', fontsize=15)
   #the list epoch_list and train_accuracy are initilazed in the session 
    plt.plot(epoch_list, train_accuracy, 'b-')
    plt.xlabel('Epoch')
    plt.ylabel('Train Accuracy')
    plt.show()

Next, lets initilaize all hyper parameters, loss, optimizer and start the tensorflow session . This code has been reused from the previous model. In order to observe the models accuracy by number of epochs closely see the comments in the code below.

In [44]:
reset_graph()
#define constants
#unrolled through 32 time steps
time_steps=32
#hidden LSTM units
num_units=128
#rows of 32 pixels
n_input=32
#learning rate for adam
learning_rate=0.001
n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]
# as we want to tune number of epochs as 500,1000,2000 we have created a list with these values.
#following which there ill be a for loop that will iterate through this list
epoch=[500,1000,2000]
#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
#for loop through iterate through each values elected as number of epochs
    for e in epoch:
        train_loss=[]
        train_accuracy=[]
        epoch_list=[]
        iter=1
        print("Number Of Epoch:",e)
        while iter<e:
            batch_x,batch_y=next_batch(batch_size,X_train,y_train)

            batch_x=batch_x.reshape((batch_size,time_steps,n_input))

            sess.run(opt, feed_dict={x: batch_x, y: batch_y})

            if iter %10==0:
                epoch_list.append(iter)
                acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
                los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
                train_loss.append(los)
                train_accuracy.append(acc)
                print("For iter ",iter)
                print("Accuracy ",acc)
                print("Loss ",los)
                print("TOTAL EPOCHS:",e)
                print("__________________")
                

            iter=iter+1
        test_data = X_test.reshape((-1, time_steps, n_input))
# test_label = mnist.test.labels[:128]
        print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
        plot_loss_epoch()
        plot_acc_epoch()
num_classes =  116
Number Of Epoch: 500
For iter  10
Accuracy  0.0390625
Loss  4.937596
TOTAL EPOCHS: 500
__________________
For iter  20
Accuracy  0.0078125
Loss  4.459712
TOTAL EPOCHS: 500
__________________
For iter  30
Accuracy  0.0546875
Loss  4.1228666
TOTAL EPOCHS: 500
__________________
For iter  40
Accuracy  0.1015625
Loss  4.0301
TOTAL EPOCHS: 500
__________________
For iter  50
Accuracy  0.109375
Loss  3.8191836
TOTAL EPOCHS: 500
__________________
For iter  60
Accuracy  0.1875
Loss  3.5331404
TOTAL EPOCHS: 500
__________________
For iter  70
Accuracy  0.109375
Loss  3.6029274
TOTAL EPOCHS: 500
__________________
For iter  80
Accuracy  0.2109375
Loss  3.3010306
TOTAL EPOCHS: 500
__________________
For iter  90
Accuracy  0.2265625
Loss  2.9665475
TOTAL EPOCHS: 500
__________________
For iter  100
Accuracy  0.296875
Loss  2.8631234
TOTAL EPOCHS: 500
__________________
For iter  110
Accuracy  0.265625
Loss  2.6802545
TOTAL EPOCHS: 500
__________________
For iter  120
Accuracy  0.3515625
Loss  2.597422
TOTAL EPOCHS: 500
__________________
For iter  130
Accuracy  0.390625
Loss  2.329112
TOTAL EPOCHS: 500
__________________
For iter  140
Accuracy  0.421875
Loss  2.1368117
TOTAL EPOCHS: 500
__________________
For iter  150
Accuracy  0.421875
Loss  2.1284077
TOTAL EPOCHS: 500
__________________
For iter  160
Accuracy  0.5078125
Loss  1.8546052
TOTAL EPOCHS: 500
__________________
For iter  170
Accuracy  0.4765625
Loss  1.768396
TOTAL EPOCHS: 500
__________________
For iter  180
Accuracy  0.5078125
Loss  1.8365426
TOTAL EPOCHS: 500
__________________
For iter  190
Accuracy  0.4765625
Loss  1.8121133
TOTAL EPOCHS: 500
__________________
For iter  200
Accuracy  0.515625
Loss  1.6512227
TOTAL EPOCHS: 500
__________________
For iter  210
Accuracy  0.5703125
Loss  1.5656372
TOTAL EPOCHS: 500
__________________
For iter  220
Accuracy  0.5546875
Loss  1.7094377
TOTAL EPOCHS: 500
__________________
For iter  230
Accuracy  0.6015625
Loss  1.3318896
TOTAL EPOCHS: 500
__________________
For iter  240
Accuracy  0.5546875
Loss  1.6468952
TOTAL EPOCHS: 500
__________________
For iter  250
Accuracy  0.6171875
Loss  1.2737907
TOTAL EPOCHS: 500
__________________
For iter  260
Accuracy  0.640625
Loss  1.2126288
TOTAL EPOCHS: 500
__________________
For iter  270
Accuracy  0.640625
Loss  1.124316
TOTAL EPOCHS: 500
__________________
For iter  280
Accuracy  0.6171875
Loss  1.220243
TOTAL EPOCHS: 500
__________________
For iter  290
Accuracy  0.671875
Loss  1.1202152
TOTAL EPOCHS: 500
__________________
For iter  300
Accuracy  0.6640625
Loss  1.0736831
TOTAL EPOCHS: 500
__________________
For iter  310
Accuracy  0.671875
Loss  1.0117435
TOTAL EPOCHS: 500
__________________
For iter  320
Accuracy  0.6640625
Loss  1.160206
TOTAL EPOCHS: 500
__________________
For iter  330
Accuracy  0.703125
Loss  1.0956899
TOTAL EPOCHS: 500
__________________
For iter  340
Accuracy  0.65625
Loss  1.0443094
TOTAL EPOCHS: 500
__________________
For iter  350
Accuracy  0.7421875
Loss  0.76165223
TOTAL EPOCHS: 500
__________________
For iter  360
Accuracy  0.6953125
Loss  1.0127045
TOTAL EPOCHS: 500
__________________
For iter  370
Accuracy  0.6953125
Loss  0.90660155
TOTAL EPOCHS: 500
__________________
For iter  380
Accuracy  0.796875
Loss  0.72508943
TOTAL EPOCHS: 500
__________________
For iter  390
Accuracy  0.734375
Loss  0.98439217
TOTAL EPOCHS: 500
__________________
For iter  400
Accuracy  0.6875
Loss  0.9219308
TOTAL EPOCHS: 500
__________________
For iter  410
Accuracy  0.6953125
Loss  0.8569292
TOTAL EPOCHS: 500
__________________
For iter  420
Accuracy  0.734375
Loss  0.9264761
TOTAL EPOCHS: 500
__________________
For iter  430
Accuracy  0.734375
Loss  0.87300074
TOTAL EPOCHS: 500
__________________
For iter  440
Accuracy  0.765625
Loss  0.7437114
TOTAL EPOCHS: 500
__________________
For iter  450
Accuracy  0.796875
Loss  0.6935152
TOTAL EPOCHS: 500
__________________
For iter  460
Accuracy  0.7890625
Loss  0.76248527
TOTAL EPOCHS: 500
__________________
For iter  470
Accuracy  0.734375
Loss  0.7832147
TOTAL EPOCHS: 500
__________________
For iter  480
Accuracy  0.7890625
Loss  0.69854164
TOTAL EPOCHS: 500
__________________
For iter  490
Accuracy  0.796875
Loss  0.6507801
TOTAL EPOCHS: 500
__________________
Testing Accuracy: 0.64806867
Number Of Epoch: 1000
For iter  10
Accuracy  0.7890625
Loss  0.58737326
TOTAL EPOCHS: 1000
__________________
For iter  20
Accuracy  0.859375
Loss  0.52995515
TOTAL EPOCHS: 1000
__________________
For iter  30
Accuracy  0.734375
Loss  0.6924572
TOTAL EPOCHS: 1000
__________________
For iter  40
Accuracy  0.8203125
Loss  0.55341053
TOTAL EPOCHS: 1000
__________________
For iter  50
Accuracy  0.828125
Loss  0.5484019
TOTAL EPOCHS: 1000
__________________
For iter  60
Accuracy  0.8359375
Loss  0.54575837
TOTAL EPOCHS: 1000
__________________
For iter  70
Accuracy  0.859375
Loss  0.47422832
TOTAL EPOCHS: 1000
__________________
For iter  80
Accuracy  0.859375
Loss  0.45950186
TOTAL EPOCHS: 1000
__________________
For iter  90
Accuracy  0.890625
Loss  0.38748038
TOTAL EPOCHS: 1000
__________________
For iter  100
Accuracy  0.875
Loss  0.42833814
TOTAL EPOCHS: 1000
__________________
For iter  110
Accuracy  0.859375
Loss  0.5480968
TOTAL EPOCHS: 1000
__________________
For iter  120
Accuracy  0.859375
Loss  0.4932591
TOTAL EPOCHS: 1000
__________________
For iter  130
Accuracy  0.796875
Loss  0.53596914
TOTAL EPOCHS: 1000
__________________
For iter  140
Accuracy  0.84375
Loss  0.4961468
TOTAL EPOCHS: 1000
__________________
For iter  150
Accuracy  0.859375
Loss  0.35863525
TOTAL EPOCHS: 1000
__________________
For iter  160
Accuracy  0.90625
Loss  0.307157
TOTAL EPOCHS: 1000
__________________
For iter  170
Accuracy  0.8515625
Loss  0.35506326
TOTAL EPOCHS: 1000
__________________
For iter  180
Accuracy  0.875
Loss  0.3833353
TOTAL EPOCHS: 1000
__________________
For iter  190
Accuracy  0.8359375
Loss  0.43908268
TOTAL EPOCHS: 1000
__________________
For iter  200
Accuracy  0.859375
Loss  0.35385025
TOTAL EPOCHS: 1000
__________________
For iter  210
Accuracy  0.859375
Loss  0.4457486
TOTAL EPOCHS: 1000
__________________
For iter  220
Accuracy  0.8828125
Loss  0.38974306
TOTAL EPOCHS: 1000
__________________
For iter  230
Accuracy  0.8671875
Loss  0.429394
TOTAL EPOCHS: 1000
__________________
For iter  240
Accuracy  0.8671875
Loss  0.43319303
TOTAL EPOCHS: 1000
__________________
For iter  250
Accuracy  0.9375
Loss  0.29507458
TOTAL EPOCHS: 1000
__________________
For iter  260
Accuracy  0.84375
Loss  0.40075618
TOTAL EPOCHS: 1000
__________________
For iter  270
Accuracy  0.8828125
Loss  0.3501325
TOTAL EPOCHS: 1000
__________________
For iter  280
Accuracy  0.9453125
Loss  0.24412474
TOTAL EPOCHS: 1000
__________________
For iter  290
Accuracy  0.8671875
Loss  0.3746681
TOTAL EPOCHS: 1000
__________________
For iter  300
Accuracy  0.921875
Loss  0.25777274
TOTAL EPOCHS: 1000
__________________
For iter  310
Accuracy  0.9140625
Loss  0.2839486
TOTAL EPOCHS: 1000
__________________
For iter  320
Accuracy  0.8828125
Loss  0.2897703
TOTAL EPOCHS: 1000
__________________
For iter  330
Accuracy  0.9296875
Loss  0.28936297
TOTAL EPOCHS: 1000
__________________
For iter  340
Accuracy  0.9140625
Loss  0.26448873
TOTAL EPOCHS: 1000
__________________
For iter  350
Accuracy  0.9296875
Loss  0.29457673
TOTAL EPOCHS: 1000
__________________
For iter  360
Accuracy  0.9140625
Loss  0.24799073
TOTAL EPOCHS: 1000
__________________
For iter  370
Accuracy  0.9296875
Loss  0.27005666
TOTAL EPOCHS: 1000
__________________
For iter  380
Accuracy  0.8671875
Loss  0.30795282
TOTAL EPOCHS: 1000
__________________
For iter  390
Accuracy  0.9609375
Loss  0.1968894
TOTAL EPOCHS: 1000
__________________
For iter  400
Accuracy  0.9765625
Loss  0.14705674
TOTAL EPOCHS: 1000
__________________
For iter  410
Accuracy  0.9296875
Loss  0.23831077
TOTAL EPOCHS: 1000
__________________
For iter  420
Accuracy  0.9453125
Loss  0.21750152
TOTAL EPOCHS: 1000
__________________
For iter  430
Accuracy  0.9140625
Loss  0.23521557
TOTAL EPOCHS: 1000
__________________
For iter  440
Accuracy  0.9375
Loss  0.29251695
TOTAL EPOCHS: 1000
__________________
For iter  450
Accuracy  0.96875
Loss  0.15287597
TOTAL EPOCHS: 1000
__________________
For iter  460
Accuracy  0.9609375
Loss  0.2211093
TOTAL EPOCHS: 1000
__________________
For iter  470
Accuracy  0.953125
Loss  0.15537837
TOTAL EPOCHS: 1000
__________________
For iter  480
Accuracy  0.9609375
Loss  0.20879404
TOTAL EPOCHS: 1000
__________________
For iter  490
Accuracy  0.9609375
Loss  0.16931064
TOTAL EPOCHS: 1000
__________________
For iter  500
Accuracy  0.9375
Loss  0.23977917
TOTAL EPOCHS: 1000
__________________
For iter  510
Accuracy  0.953125
Loss  0.19164173
TOTAL EPOCHS: 1000
__________________
For iter  520
Accuracy  0.9140625
Loss  0.26932824
TOTAL EPOCHS: 1000
__________________
For iter  530
Accuracy  0.8671875
Loss  0.306093
TOTAL EPOCHS: 1000
__________________
For iter  540
Accuracy  0.953125
Loss  0.17415567
TOTAL EPOCHS: 1000
__________________
For iter  550
Accuracy  0.921875
Loss  0.21733883
TOTAL EPOCHS: 1000
__________________
For iter  560
Accuracy  0.96875
Loss  0.13391818
TOTAL EPOCHS: 1000
__________________
For iter  570
Accuracy  0.9375
Loss  0.18726322
TOTAL EPOCHS: 1000
__________________
For iter  580
Accuracy  0.9765625
Loss  0.13752064
TOTAL EPOCHS: 1000
__________________
For iter  590
Accuracy  0.9453125
Loss  0.20730501
TOTAL EPOCHS: 1000
__________________
For iter  600
Accuracy  0.9296875
Loss  0.15834466
TOTAL EPOCHS: 1000
__________________
For iter  610
Accuracy  0.9609375
Loss  0.12811647
TOTAL EPOCHS: 1000
__________________
For iter  620
Accuracy  0.9609375
Loss  0.1526624
TOTAL EPOCHS: 1000
__________________
For iter  630
Accuracy  0.953125
Loss  0.16270337
TOTAL EPOCHS: 1000
__________________
For iter  640
Accuracy  0.96875
Loss  0.12274715
TOTAL EPOCHS: 1000
__________________
For iter  650
Accuracy  0.9765625
Loss  0.12633114
TOTAL EPOCHS: 1000
__________________
For iter  660
Accuracy  0.96875
Loss  0.13607618
TOTAL EPOCHS: 1000
__________________
For iter  670
Accuracy  0.96875
Loss  0.12294298
TOTAL EPOCHS: 1000
__________________
For iter  680
Accuracy  0.96875
Loss  0.1108928
TOTAL EPOCHS: 1000
__________________
For iter  690
Accuracy  0.9609375
Loss  0.12705275
TOTAL EPOCHS: 1000
__________________
For iter  700
Accuracy  0.9765625
Loss  0.08566496
TOTAL EPOCHS: 1000
__________________
For iter  710
Accuracy  0.9765625
Loss  0.12299916
TOTAL EPOCHS: 1000
__________________
For iter  720
Accuracy  0.96875
Loss  0.13696712
TOTAL EPOCHS: 1000
__________________
For iter  730
Accuracy  0.9609375
Loss  0.111877516
TOTAL EPOCHS: 1000
__________________
For iter  740
Accuracy  0.984375
Loss  0.10863577
TOTAL EPOCHS: 1000
__________________
For iter  750
Accuracy  0.9609375
Loss  0.16531101
TOTAL EPOCHS: 1000
__________________
For iter  760
Accuracy  0.984375
Loss  0.10488382
TOTAL EPOCHS: 1000
__________________
For iter  770
Accuracy  0.9453125
Loss  0.1533864
TOTAL EPOCHS: 1000
__________________
For iter  780
Accuracy  0.9921875
Loss  0.09016454
TOTAL EPOCHS: 1000
__________________
For iter  790
Accuracy  0.96875
Loss  0.12609129
TOTAL EPOCHS: 1000
__________________
For iter  800
Accuracy  0.953125
Loss  0.14280947
TOTAL EPOCHS: 1000
__________________
For iter  810
Accuracy  0.9765625
Loss  0.09505591
TOTAL EPOCHS: 1000
__________________
For iter  820
Accuracy  0.96875
Loss  0.13512637
TOTAL EPOCHS: 1000
__________________
For iter  830
Accuracy  0.953125
Loss  0.12690991
TOTAL EPOCHS: 1000
__________________
For iter  840
Accuracy  0.9375
Loss  0.12667021
TOTAL EPOCHS: 1000
__________________
For iter  850
Accuracy  0.9765625
Loss  0.097699866
TOTAL EPOCHS: 1000
__________________
For iter  860
Accuracy  0.96875
Loss  0.11670043
TOTAL EPOCHS: 1000
__________________
For iter  870
Accuracy  0.984375
Loss  0.10416028
TOTAL EPOCHS: 1000
__________________
For iter  880
Accuracy  0.9765625
Loss  0.09745209
TOTAL EPOCHS: 1000
__________________
For iter  890
Accuracy  0.984375
Loss  0.088378504
TOTAL EPOCHS: 1000
__________________
For iter  900
Accuracy  0.9765625
Loss  0.080849394
TOTAL EPOCHS: 1000
__________________
For iter  910
Accuracy  0.984375
Loss  0.08729801
TOTAL EPOCHS: 1000
__________________
For iter  920
Accuracy  0.9921875
Loss  0.07353224
TOTAL EPOCHS: 1000
__________________
For iter  930
Accuracy  0.9609375
Loss  0.122939266
TOTAL EPOCHS: 1000
__________________
For iter  940
Accuracy  0.984375
Loss  0.11543915
TOTAL EPOCHS: 1000
__________________
For iter  950
Accuracy  0.953125
Loss  0.16540474
TOTAL EPOCHS: 1000
__________________
For iter  960
Accuracy  0.9609375
Loss  0.12390185
TOTAL EPOCHS: 1000
__________________
For iter  970
Accuracy  0.96875
Loss  0.10995085
TOTAL EPOCHS: 1000
__________________
For iter  980
Accuracy  0.9765625
Loss  0.09836049
TOTAL EPOCHS: 1000
__________________
For iter  990
Accuracy  0.9375
Loss  0.11741346
TOTAL EPOCHS: 1000
__________________
Testing Accuracy: 0.70028615
Number Of Epoch: 2000
For iter  10
Accuracy  0.984375
Loss  0.08188081
TOTAL EPOCHS: 2000
__________________
For iter  20
Accuracy  0.984375
Loss  0.06699079
TOTAL EPOCHS: 2000
__________________
For iter  30
Accuracy  0.9921875
Loss  0.07689768
TOTAL EPOCHS: 2000
__________________
For iter  40
Accuracy  0.9921875
Loss  0.049105264
TOTAL EPOCHS: 2000
__________________
For iter  50
Accuracy  0.984375
Loss  0.06374621
TOTAL EPOCHS: 2000
__________________
For iter  60
Accuracy  0.9921875
Loss  0.07009175
TOTAL EPOCHS: 2000
__________________
For iter  70
Accuracy  0.9765625
Loss  0.07708335
TOTAL EPOCHS: 2000
__________________
For iter  80
Accuracy  0.96875
Loss  0.07015157
TOTAL EPOCHS: 2000
__________________
For iter  90
Accuracy  0.96875
Loss  0.097114235
TOTAL EPOCHS: 2000
__________________
For iter  100
Accuracy  0.9921875
Loss  0.04862974
TOTAL EPOCHS: 2000
__________________
For iter  110
Accuracy  1.0
Loss  0.04451923
TOTAL EPOCHS: 2000
__________________
For iter  120
Accuracy  1.0
Loss  0.033838198
TOTAL EPOCHS: 2000
__________________
For iter  130
Accuracy  0.984375
Loss  0.05970563
TOTAL EPOCHS: 2000
__________________
For iter  140
Accuracy  1.0
Loss  0.06416053
TOTAL EPOCHS: 2000
__________________
For iter  150
Accuracy  0.96875
Loss  0.10382402
TOTAL EPOCHS: 2000
__________________
For iter  160
Accuracy  1.0
Loss  0.06000905
TOTAL EPOCHS: 2000
__________________
For iter  170
Accuracy  0.9765625
Loss  0.07783572
TOTAL EPOCHS: 2000
__________________
For iter  180
Accuracy  0.984375
Loss  0.047422938
TOTAL EPOCHS: 2000
__________________
For iter  190
Accuracy  0.984375
Loss  0.059234947
TOTAL EPOCHS: 2000
__________________
For iter  200
Accuracy  0.9921875
Loss  0.055258088
TOTAL EPOCHS: 2000
__________________
For iter  210
Accuracy  0.9921875
Loss  0.044408485
TOTAL EPOCHS: 2000
__________________
For iter  220
Accuracy  0.9921875
Loss  0.068218425
TOTAL EPOCHS: 2000
__________________
For iter  230
Accuracy  0.984375
Loss  0.06651513
TOTAL EPOCHS: 2000
__________________
For iter  240
Accuracy  1.0
Loss  0.0326385
TOTAL EPOCHS: 2000
__________________
For iter  250
Accuracy  0.984375
Loss  0.061822634
TOTAL EPOCHS: 2000
__________________
For iter  260
Accuracy  0.9921875
Loss  0.061921243
TOTAL EPOCHS: 2000
__________________
For iter  270
Accuracy  1.0
Loss  0.05352918
TOTAL EPOCHS: 2000
__________________
For iter  280
Accuracy  0.9921875
Loss  0.038660392
TOTAL EPOCHS: 2000
__________________
For iter  290
Accuracy  0.984375
Loss  0.065444164
TOTAL EPOCHS: 2000
__________________
For iter  300
Accuracy  0.984375
Loss  0.07349113
TOTAL EPOCHS: 2000
__________________
For iter  310
Accuracy  1.0
Loss  0.046825252
TOTAL EPOCHS: 2000
__________________
For iter  320
Accuracy  0.9921875
Loss  0.0591097
TOTAL EPOCHS: 2000
__________________
For iter  330
Accuracy  1.0
Loss  0.047939025
TOTAL EPOCHS: 2000
__________________
For iter  340
Accuracy  0.9921875
Loss  0.04981237
TOTAL EPOCHS: 2000
__________________
For iter  350
Accuracy  1.0
Loss  0.05127027
TOTAL EPOCHS: 2000
__________________
For iter  360
Accuracy  0.984375
Loss  0.05620957
TOTAL EPOCHS: 2000
__________________
For iter  370
Accuracy  1.0
Loss  0.0551237
TOTAL EPOCHS: 2000
__________________
For iter  380
Accuracy  0.984375
Loss  0.059611235
TOTAL EPOCHS: 2000
__________________
For iter  390
Accuracy  1.0
Loss  0.03331601
TOTAL EPOCHS: 2000
__________________
For iter  400
Accuracy  0.9921875
Loss  0.04095623
TOTAL EPOCHS: 2000
__________________
For iter  410
Accuracy  1.0
Loss  0.028761473
TOTAL EPOCHS: 2000
__________________
For iter  420
Accuracy  0.984375
Loss  0.057911064
TOTAL EPOCHS: 2000
__________________
For iter  430
Accuracy  0.984375
Loss  0.059047233
TOTAL EPOCHS: 2000
__________________
For iter  440
Accuracy  0.9921875
Loss  0.037568185
TOTAL EPOCHS: 2000
__________________
For iter  450
Accuracy  0.9765625
Loss  0.07287029
TOTAL EPOCHS: 2000
__________________
For iter  460
Accuracy  1.0
Loss  0.029959988
TOTAL EPOCHS: 2000
__________________
For iter  470
Accuracy  0.9921875
Loss  0.037752986
TOTAL EPOCHS: 2000
__________________
For iter  480
Accuracy  1.0
Loss  0.028675538
TOTAL EPOCHS: 2000
__________________
For iter  490
Accuracy  0.984375
Loss  0.052357092
TOTAL EPOCHS: 2000
__________________
For iter  500
Accuracy  1.0
Loss  0.029036216
TOTAL EPOCHS: 2000
__________________
For iter  510
Accuracy  0.984375
Loss  0.048610736
TOTAL EPOCHS: 2000
__________________
For iter  520
Accuracy  0.9765625
Loss  0.06303997
TOTAL EPOCHS: 2000
__________________
For iter  530
Accuracy  1.0
Loss  0.024855595
TOTAL EPOCHS: 2000
__________________
For iter  540
Accuracy  0.9921875
Loss  0.047069468
TOTAL EPOCHS: 2000
__________________
For iter  550
Accuracy  1.0
Loss  0.03929275
TOTAL EPOCHS: 2000
__________________
For iter  560
Accuracy  1.0
Loss  0.048658438
TOTAL EPOCHS: 2000
__________________
For iter  570
Accuracy  0.984375
Loss  0.06146142
TOTAL EPOCHS: 2000
__________________
For iter  580
Accuracy  0.9765625
Loss  0.060782213
TOTAL EPOCHS: 2000
__________________
For iter  590
Accuracy  1.0
Loss  0.03569769
TOTAL EPOCHS: 2000
__________________
For iter  600
Accuracy  1.0
Loss  0.038108103
TOTAL EPOCHS: 2000
__________________
For iter  610
Accuracy  1.0
Loss  0.042858247
TOTAL EPOCHS: 2000
__________________
For iter  620
Accuracy  1.0
Loss  0.027506389
TOTAL EPOCHS: 2000
__________________
For iter  630
Accuracy  1.0
Loss  0.03112321
TOTAL EPOCHS: 2000
__________________
For iter  640
Accuracy  1.0
Loss  0.037420526
TOTAL EPOCHS: 2000
__________________
For iter  650
Accuracy  0.9921875
Loss  0.039378446
TOTAL EPOCHS: 2000
__________________
For iter  660
Accuracy  1.0
Loss  0.025726333
TOTAL EPOCHS: 2000
__________________
For iter  670
Accuracy  1.0
Loss  0.054270122
TOTAL EPOCHS: 2000
__________________
For iter  680
Accuracy  0.984375
Loss  0.064493835
TOTAL EPOCHS: 2000
__________________
For iter  690
Accuracy  0.9765625
Loss  0.071913056
TOTAL EPOCHS: 2000
__________________
For iter  700
Accuracy  0.984375
Loss  0.10847211
TOTAL EPOCHS: 2000
__________________
For iter  710
Accuracy  1.0
Loss  0.044413548
TOTAL EPOCHS: 2000
__________________
For iter  720
Accuracy  0.96875
Loss  0.0879825
TOTAL EPOCHS: 2000
__________________
For iter  730
Accuracy  0.9921875
Loss  0.07852478
TOTAL EPOCHS: 2000
__________________
For iter  740
Accuracy  0.9140625
Loss  0.16318007
TOTAL EPOCHS: 2000
__________________
For iter  750
Accuracy  0.984375
Loss  0.08651897
TOTAL EPOCHS: 2000
__________________
For iter  760
Accuracy  0.9921875
Loss  0.05527693
TOTAL EPOCHS: 2000
__________________
For iter  770
Accuracy  0.96875
Loss  0.08775739
TOTAL EPOCHS: 2000
__________________
For iter  780
Accuracy  0.984375
Loss  0.07199367
TOTAL EPOCHS: 2000
__________________
For iter  790
Accuracy  1.0
Loss  0.053550713
TOTAL EPOCHS: 2000
__________________
For iter  800
Accuracy  0.9921875
Loss  0.05760341
TOTAL EPOCHS: 2000
__________________
For iter  810
Accuracy  1.0
Loss  0.036800697
TOTAL EPOCHS: 2000
__________________
For iter  820
Accuracy  1.0
Loss  0.031041708
TOTAL EPOCHS: 2000
__________________
For iter  830
Accuracy  0.984375
Loss  0.07621955
TOTAL EPOCHS: 2000
__________________
For iter  840
Accuracy  0.9921875
Loss  0.022913799
TOTAL EPOCHS: 2000
__________________
For iter  850
Accuracy  1.0
Loss  0.022765849
TOTAL EPOCHS: 2000
__________________
For iter  860
Accuracy  0.984375
Loss  0.03450572
TOTAL EPOCHS: 2000
__________________
For iter  870
Accuracy  1.0
Loss  0.026236534
TOTAL EPOCHS: 2000
__________________
For iter  880
Accuracy  0.9921875
Loss  0.032714713
TOTAL EPOCHS: 2000
__________________
For iter  890
Accuracy  0.9921875
Loss  0.039520428
TOTAL EPOCHS: 2000
__________________
For iter  900
Accuracy  0.9921875
Loss  0.031177118
TOTAL EPOCHS: 2000
__________________
For iter  910
Accuracy  1.0
Loss  0.025905745
TOTAL EPOCHS: 2000
__________________
For iter  920
Accuracy  0.984375
Loss  0.031921
TOTAL EPOCHS: 2000
__________________
For iter  930
Accuracy  0.984375
Loss  0.036159515
TOTAL EPOCHS: 2000
__________________
For iter  940
Accuracy  0.9921875
Loss  0.020449942
TOTAL EPOCHS: 2000
__________________
For iter  950
Accuracy  0.9921875
Loss  0.04251081
TOTAL EPOCHS: 2000
__________________
For iter  960
Accuracy  0.9921875
Loss  0.03556306
TOTAL EPOCHS: 2000
__________________
For iter  970
Accuracy  1.0
Loss  0.041158408
TOTAL EPOCHS: 2000
__________________
For iter  980
Accuracy  1.0
Loss  0.04391477
TOTAL EPOCHS: 2000
__________________
For iter  990
Accuracy  0.984375
Loss  0.053180747
TOTAL EPOCHS: 2000
__________________
For iter  1000
Accuracy  1.0
Loss  0.022809394
TOTAL EPOCHS: 2000
__________________
For iter  1010
Accuracy  0.9921875
Loss  0.027122311
TOTAL EPOCHS: 2000
__________________
For iter  1020
Accuracy  0.9921875
Loss  0.026603244
TOTAL EPOCHS: 2000
__________________
For iter  1030
Accuracy  1.0
Loss  0.010779056
TOTAL EPOCHS: 2000
__________________
For iter  1040
Accuracy  1.0
Loss  0.014338134
TOTAL EPOCHS: 2000
__________________
For iter  1050
Accuracy  1.0
Loss  0.024442762
TOTAL EPOCHS: 2000
__________________
For iter  1060
Accuracy  1.0
Loss  0.010953408
TOTAL EPOCHS: 2000
__________________
For iter  1070
Accuracy  1.0
Loss  0.00860833
TOTAL EPOCHS: 2000
__________________
For iter  1080
Accuracy  0.9921875
Loss  0.018860312
TOTAL EPOCHS: 2000
__________________
For iter  1090
Accuracy  0.9921875
Loss  0.024239238
TOTAL EPOCHS: 2000
__________________
For iter  1100
Accuracy  0.984375
Loss  0.031634506
TOTAL EPOCHS: 2000
__________________
For iter  1110
Accuracy  1.0
Loss  0.009783618
TOTAL EPOCHS: 2000
__________________
For iter  1120
Accuracy  1.0
Loss  0.0058746245
TOTAL EPOCHS: 2000
__________________
For iter  1130
Accuracy  0.9921875
Loss  0.011358859
TOTAL EPOCHS: 2000
__________________
For iter  1140
Accuracy  1.0
Loss  0.0077910563
TOTAL EPOCHS: 2000
__________________
For iter  1150
Accuracy  1.0
Loss  0.012014767
TOTAL EPOCHS: 2000
__________________
For iter  1160
Accuracy  0.9921875
Loss  0.015398719
TOTAL EPOCHS: 2000
__________________
For iter  1170
Accuracy  1.0
Loss  0.016362805
TOTAL EPOCHS: 2000
__________________
For iter  1180
Accuracy  1.0
Loss  0.004199963
TOTAL EPOCHS: 2000
__________________
For iter  1190
Accuracy  0.9921875
Loss  0.015893798
TOTAL EPOCHS: 2000
__________________
For iter  1200
Accuracy  0.9765625
Loss  0.032216053
TOTAL EPOCHS: 2000
__________________
For iter  1210
Accuracy  1.0
Loss  0.00539802
TOTAL EPOCHS: 2000
__________________
For iter  1220
Accuracy  1.0
Loss  0.004817738
TOTAL EPOCHS: 2000
__________________
For iter  1230
Accuracy  1.0
Loss  0.008067882
TOTAL EPOCHS: 2000
__________________
For iter  1240
Accuracy  1.0
Loss  0.0067897076
TOTAL EPOCHS: 2000
__________________
For iter  1250
Accuracy  1.0
Loss  0.0053599067
TOTAL EPOCHS: 2000
__________________
For iter  1260
Accuracy  1.0
Loss  0.0075109936
TOTAL EPOCHS: 2000
__________________
For iter  1270
Accuracy  0.9921875
Loss  0.018317472
TOTAL EPOCHS: 2000
__________________
For iter  1280
Accuracy  1.0
Loss  0.011884974
TOTAL EPOCHS: 2000
__________________
For iter  1290
Accuracy  1.0
Loss  0.010953246
TOTAL EPOCHS: 2000
__________________
For iter  1300
Accuracy  1.0
Loss  0.008273397
TOTAL EPOCHS: 2000
__________________
For iter  1310
Accuracy  0.9921875
Loss  0.013726906
TOTAL EPOCHS: 2000
__________________
For iter  1320
Accuracy  1.0
Loss  0.012634102
TOTAL EPOCHS: 2000
__________________
For iter  1330
Accuracy  1.0
Loss  0.012997303
TOTAL EPOCHS: 2000
__________________
For iter  1340
Accuracy  0.9921875
Loss  0.025095165
TOTAL EPOCHS: 2000
__________________
For iter  1350
Accuracy  0.9921875
Loss  0.043606408
TOTAL EPOCHS: 2000
__________________
For iter  1360
Accuracy  0.984375
Loss  0.038462993
TOTAL EPOCHS: 2000
__________________
For iter  1370
Accuracy  0.984375
Loss  0.046309017
TOTAL EPOCHS: 2000
__________________
For iter  1380
Accuracy  0.9921875
Loss  0.04288587
TOTAL EPOCHS: 2000
__________________
For iter  1390
Accuracy  0.9921875
Loss  0.044189386
TOTAL EPOCHS: 2000
__________________
For iter  1400
Accuracy  0.9921875
Loss  0.052297458
TOTAL EPOCHS: 2000
__________________
For iter  1410
Accuracy  0.984375
Loss  0.07748174
TOTAL EPOCHS: 2000
__________________
For iter  1420
Accuracy  0.96875
Loss  0.09878261
TOTAL EPOCHS: 2000
__________________
For iter  1430
Accuracy  0.953125
Loss  0.11644407
TOTAL EPOCHS: 2000
__________________
For iter  1440
Accuracy  0.9921875
Loss  0.049469978
TOTAL EPOCHS: 2000
__________________
For iter  1450
Accuracy  0.9765625
Loss  0.10002804
TOTAL EPOCHS: 2000
__________________
For iter  1460
Accuracy  1.0
Loss  0.048597947
TOTAL EPOCHS: 2000
__________________
For iter  1470
Accuracy  0.9921875
Loss  0.052540213
TOTAL EPOCHS: 2000
__________________
For iter  1480
Accuracy  0.9921875
Loss  0.037293352
TOTAL EPOCHS: 2000
__________________
For iter  1490
Accuracy  0.984375
Loss  0.04916849
TOTAL EPOCHS: 2000
__________________
For iter  1500
Accuracy  0.9296875
Loss  0.16519293
TOTAL EPOCHS: 2000
__________________
For iter  1510
Accuracy  1.0
Loss  0.050582472
TOTAL EPOCHS: 2000
__________________
For iter  1520
Accuracy  0.9609375
Loss  0.0866583
TOTAL EPOCHS: 2000
__________________
For iter  1530
Accuracy  0.984375
Loss  0.10353866
TOTAL EPOCHS: 2000
__________________
For iter  1540
Accuracy  0.9765625
Loss  0.07879171
TOTAL EPOCHS: 2000
__________________
For iter  1550
Accuracy  0.984375
Loss  0.06788343
TOTAL EPOCHS: 2000
__________________
For iter  1560
Accuracy  0.984375
Loss  0.053554334
TOTAL EPOCHS: 2000
__________________
For iter  1570
Accuracy  0.984375
Loss  0.050038107
TOTAL EPOCHS: 2000
__________________
For iter  1580
Accuracy  0.9921875
Loss  0.03484983
TOTAL EPOCHS: 2000
__________________
For iter  1590
Accuracy  0.9921875
Loss  0.03004674
TOTAL EPOCHS: 2000
__________________
For iter  1600
Accuracy  1.0
Loss  0.016692871
TOTAL EPOCHS: 2000
__________________
For iter  1610
Accuracy  1.0
Loss  0.019464767
TOTAL EPOCHS: 2000
__________________
For iter  1620
Accuracy  1.0
Loss  0.014089182
TOTAL EPOCHS: 2000
__________________
For iter  1630
Accuracy  1.0
Loss  0.015771845
TOTAL EPOCHS: 2000
__________________
For iter  1640
Accuracy  0.9921875
Loss  0.016478766
TOTAL EPOCHS: 2000
__________________
For iter  1650
Accuracy  1.0
Loss  0.0075808694
TOTAL EPOCHS: 2000
__________________
For iter  1660
Accuracy  1.0
Loss  0.015293575
TOTAL EPOCHS: 2000
__________________
For iter  1670
Accuracy  0.9921875
Loss  0.018125784
TOTAL EPOCHS: 2000
__________________
For iter  1680
Accuracy  0.9921875
Loss  0.022101736
TOTAL EPOCHS: 2000
__________________
For iter  1690
Accuracy  1.0
Loss  0.012691566
TOTAL EPOCHS: 2000
__________________
For iter  1700
Accuracy  0.9921875
Loss  0.025919173
TOTAL EPOCHS: 2000
__________________
For iter  1710
Accuracy  1.0
Loss  0.016829632
TOTAL EPOCHS: 2000
__________________
For iter  1720
Accuracy  1.0
Loss  0.019152712
TOTAL EPOCHS: 2000
__________________
For iter  1730
Accuracy  1.0
Loss  0.0084683485
TOTAL EPOCHS: 2000
__________________
For iter  1740
Accuracy  1.0
Loss  0.0072650835
TOTAL EPOCHS: 2000
__________________
For iter  1750
Accuracy  1.0
Loss  0.010643456
TOTAL EPOCHS: 2000
__________________
For iter  1760
Accuracy  0.9921875
Loss  0.017566977
TOTAL EPOCHS: 2000
__________________
For iter  1770
Accuracy  1.0
Loss  0.010904387
TOTAL EPOCHS: 2000
__________________
For iter  1780
Accuracy  0.9921875
Loss  0.026341617
TOTAL EPOCHS: 2000
__________________
For iter  1790
Accuracy  1.0
Loss  0.011556795
TOTAL EPOCHS: 2000
__________________
For iter  1800
Accuracy  1.0
Loss  0.008030936
TOTAL EPOCHS: 2000
__________________
For iter  1810
Accuracy  0.9921875
Loss  0.031944007
TOTAL EPOCHS: 2000
__________________
For iter  1820
Accuracy  0.984375
Loss  0.029812044
TOTAL EPOCHS: 2000
__________________
For iter  1830
Accuracy  0.9921875
Loss  0.013592333
TOTAL EPOCHS: 2000
__________________
For iter  1840
Accuracy  1.0
Loss  0.019599328
TOTAL EPOCHS: 2000
__________________
For iter  1850
Accuracy  0.9921875
Loss  0.02519507
TOTAL EPOCHS: 2000
__________________
For iter  1860
Accuracy  1.0
Loss  0.004752999
TOTAL EPOCHS: 2000
__________________
For iter  1870
Accuracy  0.9921875
Loss  0.015839398
TOTAL EPOCHS: 2000
__________________
For iter  1880
Accuracy  1.0
Loss  0.0030322429
TOTAL EPOCHS: 2000
__________________
For iter  1890
Accuracy  0.984375
Loss  0.031941026
TOTAL EPOCHS: 2000
__________________
For iter  1900
Accuracy  1.0
Loss  0.011213537
TOTAL EPOCHS: 2000
__________________
For iter  1910
Accuracy  0.9921875
Loss  0.01459847
TOTAL EPOCHS: 2000
__________________
For iter  1920
Accuracy  1.0
Loss  0.010374948
TOTAL EPOCHS: 2000
__________________
For iter  1930
Accuracy  1.0
Loss  0.005544354
TOTAL EPOCHS: 2000
__________________
For iter  1940
Accuracy  1.0
Loss  0.0047445414
TOTAL EPOCHS: 2000
__________________
For iter  1950
Accuracy  1.0
Loss  0.0033204271
TOTAL EPOCHS: 2000
__________________
For iter  1960
Accuracy  0.9921875
Loss  0.009700938
TOTAL EPOCHS: 2000
__________________
For iter  1970
Accuracy  1.0
Loss  0.003153624
TOTAL EPOCHS: 2000
__________________
For iter  1980
Accuracy  1.0
Loss  0.0074812747
TOTAL EPOCHS: 2000
__________________
For iter  1990
Accuracy  1.0
Loss  0.0019284016
TOTAL EPOCHS: 2000
__________________
Testing Accuracy: 0.7081545

The above output will show us values of accuracy and loss for number od epochs as 500, 1000 and 2000 respectively.

Observation for hyper parameter tuning number of Epochs :

  1. It is observed that for 500 epochs , the model attains a trining accuracy of 79.68% and a testing accuracy of 64.80 %. The overall loss decreses as the number of epochs increase. Additionally, the training acuracy also increases as number of epochs increases

  2. It is observed that for 1000 epochs, the model attains a maximum accuracy of 99.21%, with its very last iteration providing an accuracy of 93.75%. The Testing accuracy reaches a maximum accuracy of 70.02%.

  3. We observe that in the case of 2000 epochs , the network almost plateaus. It reaches an accuracy of 100 % but its testing accuracy is 70.81%. We observe occassional spikes in the loss with the number of epochs , but the overall trend indicates that the loss decreases with the number of epochs. Additonally, it is also observed that, the model attains a 100% accuracy withen the first 250 iterations which could be as we have not set the random seed as well

Maximum Training Accuracy = 100% Maximum testing accuray=70.81% for number of epochs =2000

Final Observation for Hyper Parameter Tuning Number of Epochs

It is observed that for 200 epcohs the network almost plateaus. Hence,The total number of epochs can definetly be consisdered as an important hyper parameter for tuning the RNN LSTM model.

Hyper Parameter Tuning for RNN : Batch Size

Lets tune the batch size next and observe the performance of the network. In the previous cases, we used a batch size of 128. We will observe if there is an impact if we reduce the bacth size and increase the batch size.

Batch sizes selected 4, 512.

Lets have a look at the code for the same. Maximum bits of the code have been reused from the inital model except the batch_size parameter

Reset the tensorflow graphs and load the data as the initial model

In [48]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Next, initialize the various hyper parameters. We create a list for batch size as we need to assess the performance of the network for a batch size of 4 and 512. All the code is reused from the previous model except the batch size. closely see the comments to see the change in the code

In [49]:
reset_graph()
#define constants
#unrolled through 32 time steps
time_steps=32
#hidden LSTM units
num_units=128
#rows of 32 pixels
n_input=32
#learning rate for adam
learning_rate=0.001
# labels=116
n_classes=116
#size of batch
# batch_size is assigned a list wihth desired batch size 4 and 512. We will then iterate through this liss in the tensor flow session
batch_size=[4,512]

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
# forloop to iterate throughthe batch size. After every 800 epcohs it will select the next batch size
    for b in batch_size:
        train_loss=[]
        train_accuracy=[]
        epoch_list=[]
        iter=1
        print("Batch Size:",b)
        while iter<800:
            batch_x,batch_y=next_batch(b,X_train,y_train)

            batch_x=batch_x.reshape((b,time_steps,n_input))

            sess.run(opt, feed_dict={x: batch_x, y: batch_y})

            if iter %10==0:
                epoch_list.append(iter)
                acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
                los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
                train_loss.append(los)
                train_accuracy.append(acc)
                print("For iter ",iter)
                print("Accuracy ",acc)
                print("Loss ",los)
                print("Batch Size:",b)
                print("__________________")
                

            iter=iter+1
        test_data = X_test.reshape((-1, time_steps, n_input))
        print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
        plot_loss_epoch()
        plot_acc_epoch()
num_classes =  116
Batch Size: 4
For iter  10
Accuracy  0.0
Loss  7.307994
Batch Size: 4
__________________
For iter  20
Accuracy  0.0
Loss  5.398633
Batch Size: 4
__________________
For iter  30
Accuracy  0.0
Loss  4.305109
Batch Size: 4
__________________
For iter  40
Accuracy  0.0
Loss  3.630378
Batch Size: 4
__________________
For iter  50
Accuracy  0.0
Loss  5.2119956
Batch Size: 4
__________________
For iter  60
Accuracy  0.0
Loss  5.5067263
Batch Size: 4
__________________
For iter  70
Accuracy  0.0
Loss  3.6926906
Batch Size: 4
__________________
For iter  80
Accuracy  0.0
Loss  4.7284737
Batch Size: 4
__________________
For iter  90
Accuracy  0.0
Loss  4.2230644
Batch Size: 4
__________________
For iter  100
Accuracy  0.0
Loss  4.8498816
Batch Size: 4
__________________
For iter  110
Accuracy  0.25
Loss  3.532615
Batch Size: 4
__________________
For iter  120
Accuracy  0.0
Loss  4.627059
Batch Size: 4
__________________
For iter  130
Accuracy  0.0
Loss  4.31415
Batch Size: 4
__________________
For iter  140
Accuracy  0.25
Loss  3.4319258
Batch Size: 4
__________________
For iter  150
Accuracy  0.0
Loss  4.8465643
Batch Size: 4
__________________
For iter  160
Accuracy  0.25
Loss  4.050275
Batch Size: 4
__________________
For iter  170
Accuracy  0.0
Loss  4.159524
Batch Size: 4
__________________
For iter  180
Accuracy  0.0
Loss  5.0351286
Batch Size: 4
__________________
For iter  190
Accuracy  0.0
Loss  3.7452774
Batch Size: 4
__________________
For iter  200
Accuracy  0.0
Loss  4.918828
Batch Size: 4
__________________
For iter  210
Accuracy  0.0
Loss  4.1725974
Batch Size: 4
__________________
For iter  220
Accuracy  0.0
Loss  4.0521894
Batch Size: 4
__________________
For iter  230
Accuracy  0.0
Loss  3.1947377
Batch Size: 4
__________________
For iter  240
Accuracy  0.0
Loss  4.404345
Batch Size: 4
__________________
For iter  250
Accuracy  0.0
Loss  4.2961903
Batch Size: 4
__________________
For iter  260
Accuracy  0.25
Loss  4.9553823
Batch Size: 4
__________________
For iter  270
Accuracy  0.0
Loss  4.791644
Batch Size: 4
__________________
For iter  280
Accuracy  0.25
Loss  3.2240222
Batch Size: 4
__________________
For iter  290
Accuracy  0.0
Loss  3.616803
Batch Size: 4
__________________
For iter  300
Accuracy  0.0
Loss  3.1959758
Batch Size: 4
__________________
For iter  310
Accuracy  0.0
Loss  4.474883
Batch Size: 4
__________________
For iter  320
Accuracy  0.0
Loss  4.420149
Batch Size: 4
__________________
For iter  330
Accuracy  0.0
Loss  4.071049
Batch Size: 4
__________________
For iter  340
Accuracy  0.25
Loss  3.4433508
Batch Size: 4
__________________
For iter  350
Accuracy  0.25
Loss  3.5193832
Batch Size: 4
__________________
For iter  360
Accuracy  0.25
Loss  3.824061
Batch Size: 4
__________________
For iter  370
Accuracy  0.0
Loss  4.27708
Batch Size: 4
__________________
For iter  380
Accuracy  0.0
Loss  4.11679
Batch Size: 4
__________________
For iter  390
Accuracy  0.25
Loss  4.0292745
Batch Size: 4
__________________
For iter  400
Accuracy  0.25
Loss  3.3003008
Batch Size: 4
__________________
For iter  410
Accuracy  0.25
Loss  2.7654862
Batch Size: 4
__________________
For iter  420
Accuracy  0.0
Loss  3.8091328
Batch Size: 4
__________________
For iter  430
Accuracy  0.0
Loss  4.297285
Batch Size: 4
__________________
For iter  440
Accuracy  0.25
Loss  2.9217975
Batch Size: 4
__________________
For iter  450
Accuracy  0.5
Loss  3.5262656
Batch Size: 4
__________________
For iter  460
Accuracy  0.25
Loss  3.2892272
Batch Size: 4
__________________
For iter  470
Accuracy  0.0
Loss  4.2014747
Batch Size: 4
__________________
For iter  480
Accuracy  0.0
Loss  4.382405
Batch Size: 4
__________________
For iter  490
Accuracy  0.5
Loss  3.41919
Batch Size: 4
__________________
For iter  500
Accuracy  0.0
Loss  3.0907507
Batch Size: 4
__________________
For iter  510
Accuracy  0.0
Loss  3.8097916
Batch Size: 4
__________________
For iter  520
Accuracy  0.0
Loss  4.325842
Batch Size: 4
__________________
For iter  530
Accuracy  0.25
Loss  4.476214
Batch Size: 4
__________________
For iter  540
Accuracy  0.0
Loss  3.2522547
Batch Size: 4
__________________
For iter  550
Accuracy  0.0
Loss  2.7921705
Batch Size: 4
__________________
For iter  560
Accuracy  0.0
Loss  3.611734
Batch Size: 4
__________________
For iter  570
Accuracy  0.5
Loss  2.7435925
Batch Size: 4
__________________
For iter  580
Accuracy  0.25
Loss  2.7878697
Batch Size: 4
__________________
For iter  590
Accuracy  0.0
Loss  3.5906467
Batch Size: 4
__________________
For iter  600
Accuracy  0.25
Loss  2.484641
Batch Size: 4
__________________
For iter  610
Accuracy  0.5
Loss  2.0778618
Batch Size: 4
__________________
For iter  620
Accuracy  0.25
Loss  3.5694242
Batch Size: 4
__________________
For iter  630
Accuracy  0.0
Loss  3.8969097
Batch Size: 4
__________________
For iter  640
Accuracy  0.25
Loss  2.6784523
Batch Size: 4
__________________
For iter  650
Accuracy  0.75
Loss  2.2120662
Batch Size: 4
__________________
For iter  660
Accuracy  0.0
Loss  4.263504
Batch Size: 4
__________________
For iter  670
Accuracy  0.0
Loss  4.3164444
Batch Size: 4
__________________
For iter  680
Accuracy  0.5
Loss  3.1454754
Batch Size: 4
__________________
For iter  690
Accuracy  0.25
Loss  3.313086
Batch Size: 4
__________________
For iter  700
Accuracy  0.25
Loss  3.9669423
Batch Size: 4
__________________
For iter  710
Accuracy  0.0
Loss  3.4745746
Batch Size: 4
__________________
For iter  720
Accuracy  0.0
Loss  3.2038693
Batch Size: 4
__________________
For iter  730
Accuracy  0.5
Loss  2.0654633
Batch Size: 4
__________________
For iter  740
Accuracy  0.25
Loss  3.8705292
Batch Size: 4
__________________
For iter  750
Accuracy  0.5
Loss  2.00604
Batch Size: 4
__________________
For iter  760
Accuracy  0.5
Loss  2.160382
Batch Size: 4
__________________
For iter  770
Accuracy  0.25
Loss  2.739441
Batch Size: 4
__________________
For iter  780
Accuracy  0.0
Loss  3.8597066
Batch Size: 4
__________________
For iter  790
Accuracy  0.0
Loss  3.5302196
Batch Size: 4
__________________
Testing Accuracy: 0.19098713
Batch Size: 512
For iter  10
Accuracy  0.27929688
Loss  2.9154642
Batch Size: 512
__________________
For iter  20
Accuracy  0.28125
Loss  2.7437346
Batch Size: 512
__________________
For iter  30
Accuracy  0.3203125
Loss  2.6869621
Batch Size: 512
__________________
For iter  40
Accuracy  0.30273438
Loss  2.6958818
Batch Size: 512
__________________
For iter  50
Accuracy  0.35546875
Loss  2.5827417
Batch Size: 512
__________________
For iter  60
Accuracy  0.37695312
Loss  2.4579625
Batch Size: 512
__________________
For iter  70
Accuracy  0.34375
Loss  2.4514918
Batch Size: 512
__________________
For iter  80
Accuracy  0.38671875
Loss  2.319903
Batch Size: 512
__________________
For iter  90
Accuracy  0.37890625
Loss  2.3062882
Batch Size: 512
__________________
For iter  100
Accuracy  0.3984375
Loss  2.2207665
Batch Size: 512
__________________
For iter  110
Accuracy  0.39648438
Loss  2.2658095
Batch Size: 512
__________________
For iter  120
Accuracy  0.44726562
Loss  2.0331862
Batch Size: 512
__________________
For iter  130
Accuracy  0.4453125
Loss  2.0401103
Batch Size: 512
__________________
For iter  140
Accuracy  0.46679688
Loss  1.90295
Batch Size: 512
__________________
For iter  150
Accuracy  0.4609375
Loss  1.9440801
Batch Size: 512
__________________
For iter  160
Accuracy  0.46289062
Loss  1.9084314
Batch Size: 512
__________________
For iter  170
Accuracy  0.5136719
Loss  1.8074114
Batch Size: 512
__________________
For iter  180
Accuracy  0.5371094
Loss  1.7495694
Batch Size: 512
__________________
For iter  190
Accuracy  0.5175781
Loss  1.7575783
Batch Size: 512
__________________
For iter  200
Accuracy  0.55859375
Loss  1.6482711
Batch Size: 512
__________________
For iter  210
Accuracy  0.54296875
Loss  1.6145372
Batch Size: 512
__________________
For iter  220
Accuracy  0.5859375
Loss  1.4886053
Batch Size: 512
__________________
For iter  230
Accuracy  0.5527344
Loss  1.5657716
Batch Size: 512
__________________
For iter  240
Accuracy  0.5859375
Loss  1.4576046
Batch Size: 512
__________________
For iter  250
Accuracy  0.5957031
Loss  1.4777918
Batch Size: 512
__________________
For iter  260
Accuracy  0.625
Loss  1.327228
Batch Size: 512
__________________
For iter  270
Accuracy  0.6386719
Loss  1.294385
Batch Size: 512
__________________
For iter  280
Accuracy  0.6582031
Loss  1.1031883
Batch Size: 512
__________________
For iter  290
Accuracy  0.6582031
Loss  1.186335
Batch Size: 512
__________________
For iter  300
Accuracy  0.7089844
Loss  1.1049485
Batch Size: 512
__________________
For iter  310
Accuracy  0.71484375
Loss  1.0269792
Batch Size: 512
__________________
For iter  320
Accuracy  0.71875
Loss  0.9458457
Batch Size: 512
__________________
For iter  330
Accuracy  0.72265625
Loss  0.9450756
Batch Size: 512
__________________
For iter  340
Accuracy  0.72265625
Loss  0.93437517
Batch Size: 512
__________________
For iter  350
Accuracy  0.7011719
Loss  0.97520256
Batch Size: 512
__________________
For iter  360
Accuracy  0.74609375
Loss  0.86328256
Batch Size: 512
__________________
For iter  370
Accuracy  0.7519531
Loss  0.7855749
Batch Size: 512
__________________
For iter  380
Accuracy  0.765625
Loss  0.9479808
Batch Size: 512
__________________
For iter  390
Accuracy  0.7714844
Loss  0.7270131
Batch Size: 512
__________________
For iter  400
Accuracy  0.7890625
Loss  0.70187193
Batch Size: 512
__________________
For iter  410
Accuracy  0.83984375
Loss  0.62793404
Batch Size: 512
__________________
For iter  420
Accuracy  0.7910156
Loss  0.6615738
Batch Size: 512
__________________
For iter  430
Accuracy  0.8261719
Loss  0.5726826
Batch Size: 512
__________________
For iter  440
Accuracy  0.8125
Loss  0.684149
Batch Size: 512
__________________
For iter  450
Accuracy  0.8183594
Loss  0.6073592
Batch Size: 512
__________________
For iter  460
Accuracy  0.8125
Loss  0.62939465
Batch Size: 512
__________________
For iter  470
Accuracy  0.8183594
Loss  0.6404314
Batch Size: 512
__________________
For iter  480
Accuracy  0.84375
Loss  0.54330146
Batch Size: 512
__________________
For iter  490
Accuracy  0.8183594
Loss  0.6159786
Batch Size: 512
__________________
For iter  500
Accuracy  0.8730469
Loss  0.48221877
Batch Size: 512
__________________
For iter  510
Accuracy  0.8457031
Loss  0.4801338
Batch Size: 512
__________________
For iter  520
Accuracy  0.8769531
Loss  0.44872776
Batch Size: 512
__________________
For iter  530
Accuracy  0.8964844
Loss  0.4061752
Batch Size: 512
__________________
For iter  540
Accuracy  0.8496094
Loss  0.4409984
Batch Size: 512
__________________
For iter  550
Accuracy  0.8730469
Loss  0.4541898
Batch Size: 512
__________________
For iter  560
Accuracy  0.8984375
Loss  0.39941603
Batch Size: 512
__________________
For iter  570
Accuracy  0.89453125
Loss  0.35603115
Batch Size: 512
__________________
For iter  580
Accuracy  0.9042969
Loss  0.33046007
Batch Size: 512
__________________
For iter  590
Accuracy  0.9042969
Loss  0.33272955
Batch Size: 512
__________________
For iter  600
Accuracy  0.9082031
Loss  0.34538472
Batch Size: 512
__________________
For iter  610
Accuracy  0.9199219
Loss  0.27043933
Batch Size: 512
__________________
For iter  620
Accuracy  0.890625
Loss  0.32532433
Batch Size: 512
__________________
For iter  630
Accuracy  0.9160156
Loss  0.28137752
Batch Size: 512
__________________
For iter  640
Accuracy  0.9316406
Loss  0.30339122
Batch Size: 512
__________________
For iter  650
Accuracy  0.9316406
Loss  0.24796388
Batch Size: 512
__________________
For iter  660
Accuracy  0.953125
Loss  0.2233472
Batch Size: 512
__________________
For iter  670
Accuracy  0.93359375
Loss  0.23372078
Batch Size: 512
__________________
For iter  680
Accuracy  0.9277344
Loss  0.2362585
Batch Size: 512
__________________
For iter  690
Accuracy  0.9199219
Loss  0.25438192
Batch Size: 512
__________________
For iter  700
Accuracy  0.921875
Loss  0.26075995
Batch Size: 512
__________________
For iter  710
Accuracy  0.9355469
Loss  0.21731079
Batch Size: 512
__________________
For iter  720
Accuracy  0.9394531
Loss  0.18119282
Batch Size: 512
__________________
For iter  730
Accuracy  0.94921875
Loss  0.19850425
Batch Size: 512
__________________
For iter  740
Accuracy  0.9375
Loss  0.20347108
Batch Size: 512
__________________
For iter  750
Accuracy  0.9511719
Loss  0.18552482
Batch Size: 512
__________________
For iter  760
Accuracy  0.9511719
Loss  0.16739492
Batch Size: 512
__________________
For iter  770
Accuracy  0.9628906
Loss  0.16716811
Batch Size: 512
__________________
For iter  780
Accuracy  0.95703125
Loss  0.17821684
Batch Size: 512
__________________
For iter  790
Accuracy  0.9765625
Loss  0.1387721
Batch Size: 512
__________________
Testing Accuracy: 0.68526465

Observation for Hyper Parameter Tuning Batch Size:

  1. It is clearly observed that the batch_size of 4 did not improve the accuracy . Model reached a training accuracy of 75% and Testing accuracy of only 19%.

  2. The Batch size 512 defintely improving the training accuray by 3 % from the bench mark accuracy.

Maximum Training Accuracy for Batch_Size =512 is 97.65% and Testing Accuracy is 68.52%

Final Observation for Batch Size

Hence, icreasing the batch size from 128 to 512 definitely improved the accuracy of the model. Clearly decreasing batch size did not help but increasing the batch size to 512 helped. Hence, batch size is aprospective hyper parametr to tune for a RNN LSTM model, preferable batch size of 512

Hyper Parameter Tuning Number of Neurons for RNN

Number of Neurons selected are 1, 256

Lets first assess the performance of the RNN LSTM Model ith number of neurons as 1.

Number of Neurons

Number of Neurons selected as 1

First reset the graph and load the data as per the initial model

In [49]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Initialize the model with the same hyper parameters as the previous model with only changes in num_units. Set the num_units=1. Refer to comments for the change. and run the Tensorflow session to observe results

In [50]:
reset_graph()
#define constants
#unrolled through 28 time steps
time_steps=32
#hidden LSTM units. Set num_units=1
num_units=1
#rows of 28 pixels
n_input=32
#learning rate for adam
learning_rate=0.001
#mnist is meant to be classified in 10 classes(0-9).
n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Number of Neurons:",num_units)
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
For iter  10
Accuracy  0.0078125
Loss  5.2136865
Number of Neurons: 1
__________________
For iter  20
Accuracy  0.015625
Loss  5.1273174
Number of Neurons: 1
__________________
For iter  30
Accuracy  0.015625
Loss  5.174776
Number of Neurons: 1
__________________
For iter  40
Accuracy  0.0234375
Loss  5.084984
Number of Neurons: 1
__________________
For iter  50
Accuracy  0.0390625
Loss  5.1043673
Number of Neurons: 1
__________________
For iter  60
Accuracy  0.0234375
Loss  5.1869335
Number of Neurons: 1
__________________
For iter  70
Accuracy  0.0078125
Loss  5.2635903
Number of Neurons: 1
__________________
For iter  80
Accuracy  0.0078125
Loss  5.141111
Number of Neurons: 1
__________________
For iter  90
Accuracy  0.0234375
Loss  5.16989
Number of Neurons: 1
__________________
For iter  100
Accuracy  0.015625
Loss  5.088705
Number of Neurons: 1
__________________
For iter  110
Accuracy  0.015625
Loss  5.030806
Number of Neurons: 1
__________________
For iter  120
Accuracy  0.015625
Loss  5.048956
Number of Neurons: 1
__________________
For iter  130
Accuracy  0.0078125
Loss  5.1653214
Number of Neurons: 1
__________________
For iter  140
Accuracy  0.0390625
Loss  4.860938
Number of Neurons: 1
__________________
For iter  150
Accuracy  0.0078125
Loss  4.9574094
Number of Neurons: 1
__________________
For iter  160
Accuracy  0.0390625
Loss  4.963353
Number of Neurons: 1
__________________
For iter  170
Accuracy  0.015625
Loss  5.0173264
Number of Neurons: 1
__________________
For iter  180
Accuracy  0.0078125
Loss  5.1711483
Number of Neurons: 1
__________________
For iter  190
Accuracy  0.0
Loss  4.924092
Number of Neurons: 1
__________________
For iter  200
Accuracy  0.0390625
Loss  5.06255
Number of Neurons: 1
__________________
For iter  210
Accuracy  0.015625
Loss  5.1078377
Number of Neurons: 1
__________________
For iter  220
Accuracy  0.0
Loss  4.91474
Number of Neurons: 1
__________________
For iter  230
Accuracy  0.015625
Loss  5.0182667
Number of Neurons: 1
__________________
For iter  240
Accuracy  0.0234375
Loss  4.900587
Number of Neurons: 1
__________________
For iter  250
Accuracy  0.0234375
Loss  4.923621
Number of Neurons: 1
__________________
For iter  260
Accuracy  0.015625
Loss  4.7505713
Number of Neurons: 1
__________________
For iter  270
Accuracy  0.078125
Loss  4.814177
Number of Neurons: 1
__________________
For iter  280
Accuracy  0.0390625
Loss  4.9263077
Number of Neurons: 1
__________________
For iter  290
Accuracy  0.0234375
Loss  4.9262342
Number of Neurons: 1
__________________
For iter  300
Accuracy  0.03125
Loss  4.83368
Number of Neurons: 1
__________________
For iter  310
Accuracy  0.015625
Loss  4.9059668
Number of Neurons: 1
__________________
For iter  320
Accuracy  0.015625
Loss  4.775917
Number of Neurons: 1
__________________
For iter  330
Accuracy  0.0234375
Loss  4.838251
Number of Neurons: 1
__________________
For iter  340
Accuracy  0.0390625
Loss  4.8243637
Number of Neurons: 1
__________________
For iter  350
Accuracy  0.0078125
Loss  4.87374
Number of Neurons: 1
__________________
For iter  360
Accuracy  0.0390625
Loss  4.818504
Number of Neurons: 1
__________________
For iter  370
Accuracy  0.0078125
Loss  4.793249
Number of Neurons: 1
__________________
For iter  380
Accuracy  0.046875
Loss  4.7517014
Number of Neurons: 1
__________________
For iter  390
Accuracy  0.03125
Loss  4.7668967
Number of Neurons: 1
__________________
For iter  400
Accuracy  0.0390625
Loss  4.7295704
Number of Neurons: 1
__________________
For iter  410
Accuracy  0.03125
Loss  4.7570076
Number of Neurons: 1
__________________
For iter  420
Accuracy  0.0234375
Loss  4.8939223
Number of Neurons: 1
__________________
For iter  430
Accuracy  0.0546875
Loss  4.7145166
Number of Neurons: 1
__________________
For iter  440
Accuracy  0.046875
Loss  4.6253386
Number of Neurons: 1
__________________
For iter  450
Accuracy  0.015625
Loss  4.7291107
Number of Neurons: 1
__________________
For iter  460
Accuracy  0.046875
Loss  4.7556915
Number of Neurons: 1
__________________
For iter  470
Accuracy  0.0078125
Loss  4.6467085
Number of Neurons: 1
__________________
For iter  480
Accuracy  0.0078125
Loss  4.8825064
Number of Neurons: 1
__________________
For iter  490
Accuracy  0.0234375
Loss  4.634397
Number of Neurons: 1
__________________
For iter  500
Accuracy  0.0390625
Loss  4.7837634
Number of Neurons: 1
__________________
For iter  510
Accuracy  0.0
Loss  4.708215
Number of Neurons: 1
__________________
For iter  520
Accuracy  0.03125
Loss  4.7047343
Number of Neurons: 1
__________________
For iter  530
Accuracy  0.015625
Loss  4.618808
Number of Neurons: 1
__________________
For iter  540
Accuracy  0.015625
Loss  4.6869307
Number of Neurons: 1
__________________
For iter  550
Accuracy  0.0390625
Loss  4.6117363
Number of Neurons: 1
__________________
For iter  560
Accuracy  0.0390625
Loss  4.5626335
Number of Neurons: 1
__________________
For iter  570
Accuracy  0.015625
Loss  4.548318
Number of Neurons: 1
__________________
For iter  580
Accuracy  0.0
Loss  4.7454486
Number of Neurons: 1
__________________
For iter  590
Accuracy  0.03125
Loss  4.621969
Number of Neurons: 1
__________________
For iter  600
Accuracy  0.015625
Loss  4.5993137
Number of Neurons: 1
__________________
For iter  610
Accuracy  0.0
Loss  4.6116853
Number of Neurons: 1
__________________
For iter  620
Accuracy  0.0234375
Loss  4.520321
Number of Neurons: 1
__________________
For iter  630
Accuracy  0.015625
Loss  4.548872
Number of Neurons: 1
__________________
For iter  640
Accuracy  0.0390625
Loss  4.5803375
Number of Neurons: 1
__________________
For iter  650
Accuracy  0.046875
Loss  4.5049295
Number of Neurons: 1
__________________
For iter  660
Accuracy  0.03125
Loss  4.4039526
Number of Neurons: 1
__________________
For iter  670
Accuracy  0.0390625
Loss  4.3997574
Number of Neurons: 1
__________________
For iter  680
Accuracy  0.046875
Loss  4.513756
Number of Neurons: 1
__________________
For iter  690
Accuracy  0.0390625
Loss  4.42568
Number of Neurons: 1
__________________
For iter  700
Accuracy  0.015625
Loss  4.643255
Number of Neurons: 1
__________________
For iter  710
Accuracy  0.0390625
Loss  4.5548763
Number of Neurons: 1
__________________
For iter  720
Accuracy  0.03125
Loss  4.5313745
Number of Neurons: 1
__________________
For iter  730
Accuracy  0.03125
Loss  4.585597
Number of Neurons: 1
__________________
For iter  740
Accuracy  0.03125
Loss  4.4897575
Number of Neurons: 1
__________________
For iter  750
Accuracy  0.015625
Loss  4.428042
Number of Neurons: 1
__________________
For iter  760
Accuracy  0.03125
Loss  4.4173913
Number of Neurons: 1
__________________
For iter  770
Accuracy  0.0078125
Loss  4.49325
Number of Neurons: 1
__________________
For iter  780
Accuracy  0.046875
Loss  4.3820972
Number of Neurons: 1
__________________
For iter  790
Accuracy  0.015625
Loss  4.5163546
Number of Neurons: 1
__________________
Testing Accuracy: 0.027896997

Observation for Number of Neurons for Hyper Parameter Tuning

It is observed that just using one Neuron directly affected the accuracy. The Accuracy has a varying trend reaching a high of 80% and a low testing accuracy of 2.7%. Hence, as the number of neurons decreses the accuracy also decreases. It also observed that the accuracy sometimes reaches a high of 80% which is one spike in the data that occurs at 280 or 290th epoch.

Hyper Parameter Tuning for RNN : Number of Neurons

Number of Neurons selected are 1, 512

Number of Neurons

Number of Neurons selected as 512

Lets check the accuracy for number of neurons as 512. The initla code to load the data and reset the graph remains the samefrom the previous model

In [66]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Initialize the parameters as per the previous initial model. Just edit the num_units as 512. Observe the comments to see the change

In [67]:
reset_graph()
#define constants
#unrolled through 28 time steps
time_steps=32
#hidden LSTM units. Edit the num_units as 512
num_units=512
#rows of 28 pixels
n_input=32
#learning rate for adam
learning_rate=0.001
#mnist is meant to be classified in 10 classes(0-9).
n_classes=116
#size of batch 
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Number of Neurons:",num_units)
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
For iter  10
Accuracy  0.03125
Loss  5.873425
Number of Neurons: 512
__________________
For iter  20
Accuracy  0.09375
Loss  4.451789
Number of Neurons: 512
__________________
For iter  30
Accuracy  0.125
Loss  3.6729457
Number of Neurons: 512
__________________
For iter  40
Accuracy  0.1875
Loss  3.6211452
Number of Neurons: 512
__________________
For iter  50
Accuracy  0.265625
Loss  3.079918
Number of Neurons: 512
__________________
For iter  60
Accuracy  0.1875
Loss  3.1663487
Number of Neurons: 512
__________________
For iter  70
Accuracy  0.21875
Loss  2.685792
Number of Neurons: 512
__________________
For iter  80
Accuracy  0.3359375
Loss  2.4851573
Number of Neurons: 512
__________________
For iter  90
Accuracy  0.3828125
Loss  2.4655735
Number of Neurons: 512
__________________
For iter  100
Accuracy  0.40625
Loss  2.3533006
Number of Neurons: 512
__________________
For iter  110
Accuracy  0.4375
Loss  1.8760993
Number of Neurons: 512
__________________
For iter  120
Accuracy  0.5234375
Loss  1.9086113
Number of Neurons: 512
__________________
For iter  130
Accuracy  0.5859375
Loss  1.6255774
Number of Neurons: 512
__________________
For iter  140
Accuracy  0.578125
Loss  1.3331031
Number of Neurons: 512
__________________
For iter  150
Accuracy  0.6328125
Loss  1.347614
Number of Neurons: 512
__________________
For iter  160
Accuracy  0.578125
Loss  1.3294617
Number of Neurons: 512
__________________
For iter  170
Accuracy  0.65625
Loss  1.1038976
Number of Neurons: 512
__________________
For iter  180
Accuracy  0.6875
Loss  1.1053375
Number of Neurons: 512
__________________
For iter  190
Accuracy  0.6640625
Loss  1.1095529
Number of Neurons: 512
__________________
For iter  200
Accuracy  0.7265625
Loss  0.9374541
Number of Neurons: 512
__________________
For iter  210
Accuracy  0.6640625
Loss  0.9732061
Number of Neurons: 512
__________________
For iter  220
Accuracy  0.6640625
Loss  0.9885645
Number of Neurons: 512
__________________
For iter  230
Accuracy  0.7734375
Loss  0.8288075
Number of Neurons: 512
__________________
For iter  240
Accuracy  0.8203125
Loss  0.6380265
Number of Neurons: 512
__________________
For iter  250
Accuracy  0.7890625
Loss  0.5527607
Number of Neurons: 512
__________________
For iter  260
Accuracy  0.7578125
Loss  0.7701353
Number of Neurons: 512
__________________
For iter  270
Accuracy  0.6484375
Loss  1.054456
Number of Neurons: 512
__________________
For iter  280
Accuracy  0.828125
Loss  0.66557443
Number of Neurons: 512
__________________
For iter  290
Accuracy  0.6875
Loss  0.81858593
Number of Neurons: 512
__________________
For iter  300
Accuracy  0.828125
Loss  0.54572934
Number of Neurons: 512
__________________
For iter  310
Accuracy  0.8046875
Loss  0.56968987
Number of Neurons: 512
__________________
For iter  320
Accuracy  0.8359375
Loss  0.5880608
Number of Neurons: 512
__________________
For iter  330
Accuracy  0.828125
Loss  0.5807699
Number of Neurons: 512
__________________
For iter  340
Accuracy  0.8828125
Loss  0.42656398
Number of Neurons: 512
__________________
For iter  350
Accuracy  0.8359375
Loss  0.5299959
Number of Neurons: 512
__________________
For iter  360
Accuracy  0.8359375
Loss  0.56395394
Number of Neurons: 512
__________________
For iter  370
Accuracy  0.8125
Loss  0.5777482
Number of Neurons: 512
__________________
For iter  380
Accuracy  0.8984375
Loss  0.37397993
Number of Neurons: 512
__________________
For iter  390
Accuracy  0.8671875
Loss  0.41461852
Number of Neurons: 512
__________________
For iter  400
Accuracy  0.8671875
Loss  0.36999387
Number of Neurons: 512
__________________
For iter  410
Accuracy  0.859375
Loss  0.3920861
Number of Neurons: 512
__________________
For iter  420
Accuracy  0.8671875
Loss  0.37264097
Number of Neurons: 512
__________________
For iter  430
Accuracy  0.859375
Loss  0.4669093
Number of Neurons: 512
__________________
For iter  440
Accuracy  0.84375
Loss  0.4197589
Number of Neurons: 512
__________________
For iter  450
Accuracy  0.921875
Loss  0.32214105
Number of Neurons: 512
__________________
For iter  460
Accuracy  0.9375
Loss  0.2169623
Number of Neurons: 512
__________________
For iter  470
Accuracy  0.875
Loss  0.3066223
Number of Neurons: 512
__________________
For iter  480
Accuracy  0.890625
Loss  0.2690618
Number of Neurons: 512
__________________
For iter  490
Accuracy  0.921875
Loss  0.2376353
Number of Neurons: 512
__________________
For iter  500
Accuracy  0.875
Loss  0.30260095
Number of Neurons: 512
__________________
For iter  510
Accuracy  0.921875
Loss  0.21314469
Number of Neurons: 512
__________________
For iter  520
Accuracy  0.8984375
Loss  0.30547574
Number of Neurons: 512
__________________
For iter  530
Accuracy  0.890625
Loss  0.37738413
Number of Neurons: 512
__________________
For iter  540
Accuracy  0.890625
Loss  0.35259834
Number of Neurons: 512
__________________
For iter  550
Accuracy  0.90625
Loss  0.2896911
Number of Neurons: 512
__________________
For iter  560
Accuracy  0.890625
Loss  0.32887644
Number of Neurons: 512
__________________
For iter  570
Accuracy  0.90625
Loss  0.27483094
Number of Neurons: 512
__________________
For iter  580
Accuracy  0.875
Loss  0.28852004
Number of Neurons: 512
__________________
For iter  590
Accuracy  0.9453125
Loss  0.18163723
Number of Neurons: 512
__________________
For iter  600
Accuracy  0.8984375
Loss  0.2824292
Number of Neurons: 512
__________________
For iter  610
Accuracy  0.921875
Loss  0.21057409
Number of Neurons: 512
__________________
For iter  620
Accuracy  0.9453125
Loss  0.20019212
Number of Neurons: 512
__________________
For iter  630
Accuracy  0.96875
Loss  0.1435095
Number of Neurons: 512
__________________
For iter  640
Accuracy  0.9609375
Loss  0.1349816
Number of Neurons: 512
__________________
For iter  650
Accuracy  0.9453125
Loss  0.16929372
Number of Neurons: 512
__________________
For iter  660
Accuracy  0.953125
Loss  0.17068726
Number of Neurons: 512
__________________
For iter  670
Accuracy  0.921875
Loss  0.20070507
Number of Neurons: 512
__________________
For iter  680
Accuracy  0.9296875
Loss  0.17595439
Number of Neurons: 512
__________________
For iter  690
Accuracy  0.9296875
Loss  0.1999127
Number of Neurons: 512
__________________
For iter  700
Accuracy  0.9296875
Loss  0.20232867
Number of Neurons: 512
__________________
For iter  710
Accuracy  0.96875
Loss  0.11253822
Number of Neurons: 512
__________________
For iter  720
Accuracy  0.9765625
Loss  0.13531461
Number of Neurons: 512
__________________
For iter  730
Accuracy  0.921875
Loss  0.16392682
Number of Neurons: 512
__________________
For iter  740
Accuracy  0.9375
Loss  0.18335673
Number of Neurons: 512
__________________
For iter  750
Accuracy  0.953125
Loss  0.16964927
Number of Neurons: 512
__________________
For iter  760
Accuracy  0.9765625
Loss  0.094717644
Number of Neurons: 512
__________________
For iter  770
Accuracy  0.9609375
Loss  0.11069293
Number of Neurons: 512
__________________
For iter  780
Accuracy  0.9375
Loss  0.13501516
Number of Neurons: 512
__________________
For iter  790
Accuracy  0.953125
Loss  0.17977408
Number of Neurons: 512
__________________
Testing Accuracy: 0.7181688

Observation for number of neurons as 512

It is observaed that there is a considerable increase in accuracy with the increase in number of neurons. When number of neurons increased from 128 to 500 there is an increase in accuracy , surpassing the accuracy of the very first model. The testing accuracy increased further to 71.81 which is a considerable increase in testing accuracy. Hence, increasing the number of neurons did improve the accuracy considerably.

Training Accuracy= 97.65% Testing accuracy= 71.81%

Final Observation for Number of Neurons Hyper Parameter Tuning

It is observed that reducing he number of neurons did not help the performance. Infact it decreased the performance of the nwtrok. Increase the number of neuron to 512 improved the accuracy of the testing set which is a good sign.

Hence, number of neurons can defintely be considered as a prospective parameter to tune a RNN LSTM Model. Moreover for this a number of neurons 512 worked the best.

Hyper Parameter Tuning for combination of Learning Rate and Number of Neurons

We will now tune the Learning rate to values 0.1,0.001 to see if the learning rate impacts the performance of the network

Learning Rate

Lets start woth learning rate=0.1 and number of neurons as 512. As before the code to reset the graph and load the data remains the same as the initla model.

In [71]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Next initlaize the hyper parameters as per the initial model. Edit the learning rate to 0.1 nad number of neurons as 512

In [72]:
reset_graph()
#define constants
#unrolled through 28 time steps
time_steps=32
#hidden LSTM units
num_units=512
#rows of 28 pixels
n_input=32
#learning rate for adam
learning_rate=0.1
#mnist is meant to be classified in 10 classes(0-9).
n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Learning rate:",learning_rate)
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
For iter  10
Accuracy  0.0234375
Loss  52.117012
Learning rate: 0.1
__________________
For iter  20
Accuracy  0.0078125
Loss  107.82073
Learning rate: 0.1
__________________
For iter  30
Accuracy  0.0078125
Loss  142.43036
Learning rate: 0.1
__________________
For iter  40
Accuracy  0.0078125
Loss  175.2469
Learning rate: 0.1
__________________
For iter  50
Accuracy  0.03125
Loss  136.81943
Learning rate: 0.1
__________________
For iter  60
Accuracy  0.015625
Loss  76.703545
Learning rate: 0.1
__________________
For iter  70
Accuracy  0.03125
Loss  76.71801
Learning rate: 0.1
__________________
For iter  80
Accuracy  0.0
Loss  58.475456
Learning rate: 0.1
__________________
For iter  90
Accuracy  0.0078125
Loss  34.853924
Learning rate: 0.1
__________________
For iter  100
Accuracy  0.0234375
Loss  34.136166
Learning rate: 0.1
__________________
For iter  110
Accuracy  0.0234375
Loss  25.673159
Learning rate: 0.1
__________________
For iter  120
Accuracy  0.0234375
Loss  21.944859
Learning rate: 0.1
__________________
For iter  130
Accuracy  0.0078125
Loss  18.148312
Learning rate: 0.1
__________________
For iter  140
Accuracy  0.0078125
Loss  10.36092
Learning rate: 0.1
__________________
For iter  150
Accuracy  0.0390625
Loss  9.531163
Learning rate: 0.1
__________________
For iter  160
Accuracy  0.0234375
Loss  8.972416
Learning rate: 0.1
__________________
For iter  170
Accuracy  0.03125
Loss  7.3656406
Learning rate: 0.1
__________________
For iter  180
Accuracy  0.046875
Loss  6.076088
Learning rate: 0.1
__________________
For iter  190
Accuracy  0.03125
Loss  7.28172
Learning rate: 0.1
__________________
For iter  200
Accuracy  0.0390625
Loss  8.102804
Learning rate: 0.1
__________________
For iter  210
Accuracy  0.0625
Loss  5.477697
Learning rate: 0.1
__________________
For iter  220
Accuracy  0.0234375
Loss  5.2511992
Learning rate: 0.1
__________________
For iter  230
Accuracy  0.0625
Loss  5.894226
Learning rate: 0.1
__________________
For iter  240
Accuracy  0.015625
Loss  4.5760827
Learning rate: 0.1
__________________
For iter  250
Accuracy  0.0546875
Loss  5.115344
Learning rate: 0.1
__________________
For iter  260
Accuracy  0.046875
Loss  6.4027014
Learning rate: 0.1
__________________
For iter  270
Accuracy  0.0859375
Loss  5.433777
Learning rate: 0.1
__________________
For iter  280
Accuracy  0.046875
Loss  6.441272
Learning rate: 0.1
__________________
For iter  290
Accuracy  0.0546875
Loss  5.2695603
Learning rate: 0.1
__________________
For iter  300
Accuracy  0.0625
Loss  5.2867937
Learning rate: 0.1
__________________
For iter  310
Accuracy  0.0390625
Loss  5.965334
Learning rate: 0.1
__________________
For iter  320
Accuracy  0.0390625
Loss  9.648063
Learning rate: 0.1
__________________
For iter  330
Accuracy  0.0390625
Loss  5.8993444
Learning rate: 0.1
__________________
For iter  340
Accuracy  0.1015625
Loss  5.4465914
Learning rate: 0.1
__________________
For iter  350
Accuracy  0.078125
Loss  5.579677
Learning rate: 0.1
__________________
For iter  360
Accuracy  0.0703125
Loss  6.6275244
Learning rate: 0.1
__________________
For iter  370
Accuracy  0.0390625
Loss  6.7563233
Learning rate: 0.1
__________________
For iter  380
Accuracy  0.0390625
Loss  7.439664
Learning rate: 0.1
__________________
For iter  390
Accuracy  0.078125
Loss  9.338428
Learning rate: 0.1
__________________
For iter  400
Accuracy  0.0390625
Loss  8.316489
Learning rate: 0.1
__________________
For iter  410
Accuracy  0.0390625
Loss  8.321957
Learning rate: 0.1
__________________
For iter  420
Accuracy  0.0390625
Loss  7.1907105
Learning rate: 0.1
__________________
For iter  430
Accuracy  0.015625
Loss  6.9564886
Learning rate: 0.1
__________________
For iter  440
Accuracy  0.0546875
Loss  10.338965
Learning rate: 0.1
__________________
For iter  450
Accuracy  0.046875
Loss  8.15852
Learning rate: 0.1
__________________
For iter  460
Accuracy  0.0234375
Loss  7.844981
Learning rate: 0.1
__________________
For iter  470
Accuracy  0.0390625
Loss  11.295022
Learning rate: 0.1
__________________
For iter  480
Accuracy  0.0625
Loss  10.578598
Learning rate: 0.1
__________________
For iter  490
Accuracy  0.0234375
Loss  10.316763
Learning rate: 0.1
__________________
For iter  500
Accuracy  0.0234375
Loss  14.300995
Learning rate: 0.1
__________________
For iter  510
Accuracy  0.0
Loss  15.292046
Learning rate: 0.1
__________________
For iter  520
Accuracy  0.046875
Loss  10.562048
Learning rate: 0.1
__________________
For iter  530
Accuracy  0.046875
Loss  13.529181
Learning rate: 0.1
__________________
For iter  540
Accuracy  0.015625
Loss  17.885534
Learning rate: 0.1
__________________
For iter  550
Accuracy  0.0546875
Loss  15.73193
Learning rate: 0.1
__________________
For iter  560
Accuracy  0.0703125
Loss  17.53782
Learning rate: 0.1
__________________
For iter  570
Accuracy  0.03125
Loss  17.331608
Learning rate: 0.1
__________________
For iter  580
Accuracy  0.03125
Loss  18.354908
Learning rate: 0.1
__________________
For iter  590
Accuracy  0.0390625
Loss  17.918724
Learning rate: 0.1
__________________
For iter  600
Accuracy  0.0703125
Loss  22.128895
Learning rate: 0.1
__________________
For iter  610
Accuracy  0.03125
Loss  21.728928
Learning rate: 0.1
__________________
For iter  620
Accuracy  0.046875
Loss  19.738476
Learning rate: 0.1
__________________
For iter  630
Accuracy  0.015625
Loss  23.425655
Learning rate: 0.1
__________________
For iter  640
Accuracy  0.09375
Loss  22.092487
Learning rate: 0.1
__________________
For iter  650
Accuracy  0.0703125
Loss  26.6432
Learning rate: 0.1
__________________
For iter  660
Accuracy  0.0
Loss  29.96331
Learning rate: 0.1
__________________
For iter  670
Accuracy  0.0703125
Loss  29.637047
Learning rate: 0.1
__________________
For iter  680
Accuracy  0.0625
Loss  28.435314
Learning rate: 0.1
__________________
For iter  690
Accuracy  0.0546875
Loss  26.847317
Learning rate: 0.1
__________________
For iter  700
Accuracy  0.0234375
Loss  29.169308
Learning rate: 0.1
__________________
For iter  710
Accuracy  0.0546875
Loss  27.47713
Learning rate: 0.1
__________________
For iter  720
Accuracy  0.015625
Loss  30.402163
Learning rate: 0.1
__________________
For iter  730
Accuracy  0.0390625
Loss  35.875957
Learning rate: 0.1
__________________
For iter  740
Accuracy  0.0390625
Loss  34.65452
Learning rate: 0.1
__________________
For iter  750
Accuracy  0.0234375
Loss  36.007603
Learning rate: 0.1
__________________
For iter  760
Accuracy  0.0078125
Loss  40.36262
Learning rate: 0.1
__________________
For iter  770
Accuracy  0.0546875
Loss  33.0464
Learning rate: 0.1
__________________
For iter  780
Accuracy  0.03125
Loss  37.919632
Learning rate: 0.1
__________________
For iter  790
Accuracy  0.0234375
Loss  35.551537
Learning rate: 0.1
__________________
Testing Accuracy: 0.02646638

Observation

Increasing the learning rate , provided a varied accuracy in the training. There are varied spikes in the data , where the accuracy also reached an overall 100% as well. But it is not possible to distinguish if increases the learning rate has helped the RNN LSTM model. Lets observe what happens when the learning rate is decreased. This means that the algorithm is going to learn slowly but navigate the loss scenario slowly.

Highest Trainig Accuray = 100%(but unstable) Testing accuracy=2.6%

Hyper Parameter Tuning for RNN : Learning Rate

Next, we will tune the model for learning rate=0.0001 and batch size=128. Code remains the same as the initial model for loading the datatset.

In [73]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Reuse the code for initializing hyper parameters as the initial model. Only edit the learning rate to be 0.0001 and num_units as 512.

In [74]:
reset_graph()
#define constants
#unrolled through 28 time steps
time_steps=32
#hidden LSTM units
num_units=512
#rows of 28 pixels
n_input=32
#learning rate for adam
learning_rate=0.0001
#mnist is meant to be classified in 10 classes(0-9).
n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Learning rate:",learning_rate)
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
For iter  10
Accuracy  0.03125
Loss  4.5631466
Learning rate: 0.0001
__________________
For iter  20
Accuracy  0.03125
Loss  3.989932
Learning rate: 0.0001
__________________
For iter  30
Accuracy  0.109375
Loss  3.807466
Learning rate: 0.0001
__________________
For iter  40
Accuracy  0.109375
Loss  3.818472
Learning rate: 0.0001
__________________
For iter  50
Accuracy  0.171875
Loss  3.3791814
Learning rate: 0.0001
__________________
For iter  60
Accuracy  0.125
Loss  3.6153376
Learning rate: 0.0001
__________________
For iter  70
Accuracy  0.1640625
Loss  3.326272
Learning rate: 0.0001
__________________
For iter  80
Accuracy  0.2109375
Loss  3.1809123
Learning rate: 0.0001
__________________
For iter  90
Accuracy  0.1953125
Loss  3.1856256
Learning rate: 0.0001
__________________
For iter  100
Accuracy  0.21875
Loss  2.9985957
Learning rate: 0.0001
__________________
For iter  110
Accuracy  0.2890625
Loss  2.511023
Learning rate: 0.0001
__________________
For iter  120
Accuracy  0.359375
Loss  2.7589455
Learning rate: 0.0001
__________________
For iter  130
Accuracy  0.328125
Loss  2.3308449
Learning rate: 0.0001
__________________
For iter  140
Accuracy  0.4140625
Loss  2.2295265
Learning rate: 0.0001
__________________
For iter  150
Accuracy  0.4765625
Loss  2.0780296
Learning rate: 0.0001
__________________
For iter  160
Accuracy  0.453125
Loss  2.0138664
Learning rate: 0.0001
__________________
For iter  170
Accuracy  0.5078125
Loss  1.9225918
Learning rate: 0.0001
__________________
For iter  180
Accuracy  0.53125
Loss  1.7968841
Learning rate: 0.0001
__________________
For iter  190
Accuracy  0.515625
Loss  1.8976163
Learning rate: 0.0001
__________________
For iter  200
Accuracy  0.578125
Loss  1.6391277
Learning rate: 0.0001
__________________
For iter  210
Accuracy  0.5390625
Loss  1.6584065
Learning rate: 0.0001
__________________
For iter  220
Accuracy  0.4921875
Loss  1.8070487
Learning rate: 0.0001
__________________
For iter  230
Accuracy  0.578125
Loss  1.5319884
Learning rate: 0.0001
__________________
For iter  240
Accuracy  0.5546875
Loss  1.3935498
Learning rate: 0.0001
__________________
For iter  250
Accuracy  0.5703125
Loss  1.3325589
Learning rate: 0.0001
__________________
For iter  260
Accuracy  0.6640625
Loss  1.1032386
Learning rate: 0.0001
__________________
For iter  270
Accuracy  0.59375
Loss  1.7600543
Learning rate: 0.0001
__________________
For iter  280
Accuracy  0.671875
Loss  1.3048396
Learning rate: 0.0001
__________________
For iter  290
Accuracy  0.578125
Loss  1.5710223
Learning rate: 0.0001
__________________
For iter  300
Accuracy  0.671875
Loss  0.9970095
Learning rate: 0.0001
__________________
For iter  310
Accuracy  0.6640625
Loss  1.0392046
Learning rate: 0.0001
__________________
For iter  320
Accuracy  0.65625
Loss  1.1995621
Learning rate: 0.0001
__________________
For iter  330
Accuracy  0.6484375
Loss  1.1808105
Learning rate: 0.0001
__________________
For iter  340
Accuracy  0.7890625
Loss  0.8395099
Learning rate: 0.0001
__________________
For iter  350
Accuracy  0.6875
Loss  1.0145433
Learning rate: 0.0001
__________________
For iter  360
Accuracy  0.7109375
Loss  1.0068942
Learning rate: 0.0001
__________________
For iter  370
Accuracy  0.7265625
Loss  1.0970731
Learning rate: 0.0001
__________________
For iter  380
Accuracy  0.7109375
Loss  0.9958595
Learning rate: 0.0001
__________________
For iter  390
Accuracy  0.75
Loss  0.9365319
Learning rate: 0.0001
__________________
For iter  400
Accuracy  0.6640625
Loss  0.974906
Learning rate: 0.0001
__________________
For iter  410
Accuracy  0.765625
Loss  1.023454
Learning rate: 0.0001
__________________
For iter  420
Accuracy  0.734375
Loss  0.89517665
Learning rate: 0.0001
__________________
For iter  430
Accuracy  0.7265625
Loss  0.9083204
Learning rate: 0.0001
__________________
For iter  440
Accuracy  0.734375
Loss  0.8739385
Learning rate: 0.0001
__________________
For iter  450
Accuracy  0.8125
Loss  0.7294475
Learning rate: 0.0001
__________________
For iter  460
Accuracy  0.796875
Loss  0.7134402
Learning rate: 0.0001
__________________
For iter  470
Accuracy  0.7421875
Loss  0.74107534
Learning rate: 0.0001
__________________
For iter  480
Accuracy  0.7734375
Loss  0.8300466
Learning rate: 0.0001
__________________
For iter  490
Accuracy  0.8125
Loss  0.6525448
Learning rate: 0.0001
__________________
For iter  500
Accuracy  0.7265625
Loss  0.8290632
Learning rate: 0.0001
__________________
For iter  510
Accuracy  0.828125
Loss  0.63543355
Learning rate: 0.0001
__________________
For iter  520
Accuracy  0.8046875
Loss  0.7640125
Learning rate: 0.0001
__________________
For iter  530
Accuracy  0.7265625
Loss  0.8020152
Learning rate: 0.0001
__________________
For iter  540
Accuracy  0.75
Loss  0.9451337
Learning rate: 0.0001
__________________
For iter  550
Accuracy  0.7265625
Loss  0.98019266
Learning rate: 0.0001
__________________
For iter  560
Accuracy  0.7109375
Loss  0.9853168
Learning rate: 0.0001
__________________
For iter  570
Accuracy  0.7890625
Loss  0.6896689
Learning rate: 0.0001
__________________
For iter  580
Accuracy  0.7578125
Loss  0.5844698
Learning rate: 0.0001
__________________
For iter  590
Accuracy  0.828125
Loss  0.5939628
Learning rate: 0.0001
__________________
For iter  600
Accuracy  0.7578125
Loss  0.7810643
Learning rate: 0.0001
__________________
For iter  610
Accuracy  0.84375
Loss  0.54100776
Learning rate: 0.0001
__________________
For iter  620
Accuracy  0.7578125
Loss  0.6011753
Learning rate: 0.0001
__________________
For iter  630
Accuracy  0.7890625
Loss  0.5848334
Learning rate: 0.0001
__________________
For iter  640
Accuracy  0.8125
Loss  0.6317098
Learning rate: 0.0001
__________________
For iter  650
Accuracy  0.765625
Loss  0.5345421
Learning rate: 0.0001
__________________
For iter  660
Accuracy  0.8046875
Loss  0.70984924
Learning rate: 0.0001
__________________
For iter  670
Accuracy  0.7578125
Loss  0.57907075
Learning rate: 0.0001
__________________
For iter  680
Accuracy  0.8828125
Loss  0.43478602
Learning rate: 0.0001
__________________
For iter  690
Accuracy  0.8046875
Loss  0.6819224
Learning rate: 0.0001
__________________
For iter  700
Accuracy  0.8046875
Loss  0.58306175
Learning rate: 0.0001
__________________
For iter  710
Accuracy  0.8203125
Loss  0.5587112
Learning rate: 0.0001
__________________
For iter  720
Accuracy  0.8359375
Loss  0.66828763
Learning rate: 0.0001
__________________
For iter  730
Accuracy  0.8203125
Loss  0.50937885
Learning rate: 0.0001
__________________
For iter  740
Accuracy  0.8359375
Loss  0.5306647
Learning rate: 0.0001
__________________
For iter  750
Accuracy  0.8359375
Loss  0.5107412
Learning rate: 0.0001
__________________
For iter  760
Accuracy  0.859375
Loss  0.451926
Learning rate: 0.0001
__________________
For iter  770
Accuracy  0.859375
Loss  0.46211195
Learning rate: 0.0001
__________________
For iter  780
Accuracy  0.8359375
Loss  0.40955555
Learning rate: 0.0001
__________________
For iter  790
Accuracy  0.859375
Loss  0.42612824
Learning rate: 0.0001
__________________
Testing Accuracy: 0.65450644

Observation

After increasing the learning rate it is observed that, the accuracy increases in comparison to decreasing the learnin rate. However, it does not improve the accuracy over the bench mark accuracy of 94.53% and a testing accuracy of 68.59%. Hence decreasing the learning rate did help but not considerable. The optimal performanceof the network was at a learning rate of 0.001 as per the very first model before performing hyper parameter tuning.

Final Observation with Hyper Parameter combination Learning Rate and Number of Neurons

It is observed that decreasing the learning rate to 0.0001 and number of neurons has 512 provided an increase in accuracy in comparison to a learning rate of 0.1 and number of neurons as 512. But the most desirable result was at a leanring rate of 0.001 and number of neurons as 128.

Hence, though the learning rate affected the performance of the model with number of neurons as 128 . This combination would be considered a less prospective hyper parameter ti imporve accuracy of the RNN LSTM Model. However, learning rate by it self could be considered a prospective parameter to tune for the RNN LSTM model

Hyper Parameter Tuning Optimizer for RNN :

In this section we will tune the Optimizer with values RMSProp, Adagrad

Lets start by training the Optimizer -Adagrad. Reuse the code from the initial RNN LSTM MOdel model for restting the graph and loading the data.

In [79]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Initialize the hyper parameters as per the inital model. Ensure to change the optimizer to Adagrad optimizer. Closely see the comments to notice the change

In [80]:
reset_graph()
#define constants
#unrolled through 32 time steps
time_steps=32
#hidden LSTM units
num_units=512
#rows of 32 pixels
n_input=32
#learning rate for adam
learning_rate=0.0001

n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
## Change optimizer to Adagrad
opt=tf.train.AdagradOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Optimizer: Adagrad")
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
Exception ignored in: <bound method ScopedTFStatus.__del__ of <tensorflow.python.framework.c_api_util.ScopedTFStatus object at 0x00000148A6D54E80>>
Traceback (most recent call last):
  File "C:\Users\jaini\Anaconda3\lib\site-packages\tensorflow\python\framework\c_api_util.py", line 37, in __del__
    c_api.TF_DeleteStatus(self.status)
AttributeError: 'ScopedTFStatus' object has no attribute 'status'
For iter  10
Accuracy  0.0078125
Loss  5.708848
Optimizer: Adagrad
__________________
For iter  20
Accuracy  0.03125
Loss  5.513981
Optimizer: Adagrad
__________________
For iter  30
Accuracy  0.0234375
Loss  5.3684454
Optimizer: Adagrad
__________________
For iter  40
Accuracy  0.03125
Loss  4.8760586
Optimizer: Adagrad
__________________
For iter  50
Accuracy  0.03125
Loss  4.6596346
Optimizer: Adagrad
__________________
For iter  60
Accuracy  0.03125
Loss  4.5860796
Optimizer: Adagrad
__________________
For iter  70
Accuracy  0.03125
Loss  4.6313457
Optimizer: Adagrad
__________________
For iter  80
Accuracy  0.0234375
Loss  4.7272644
Optimizer: Adagrad
__________________
For iter  90
Accuracy  0.0078125
Loss  4.8280044
Optimizer: Adagrad
__________________
For iter  100
Accuracy  0.0078125
Loss  4.615809
Optimizer: Adagrad
__________________
For iter  110
Accuracy  0.03125
Loss  4.572323
Optimizer: Adagrad
__________________
For iter  120
Accuracy  0.015625
Loss  4.535475
Optimizer: Adagrad
__________________
For iter  130
Accuracy  0.03125
Loss  4.6173563
Optimizer: Adagrad
__________________
For iter  140
Accuracy  0.015625
Loss  4.270857
Optimizer: Adagrad
__________________
For iter  150
Accuracy  0.046875
Loss  4.2968183
Optimizer: Adagrad
__________________
For iter  160
Accuracy  0.0234375
Loss  4.3165026
Optimizer: Adagrad
__________________
For iter  170
Accuracy  0.0234375
Loss  4.270564
Optimizer: Adagrad
__________________
For iter  180
Accuracy  0.0546875
Loss  4.1602163
Optimizer: Adagrad
__________________
For iter  190
Accuracy  0.0390625
Loss  4.225629
Optimizer: Adagrad
__________________
For iter  200
Accuracy  0.0390625
Loss  4.136176
Optimizer: Adagrad
__________________
For iter  210
Accuracy  0.0390625
Loss  4.26178
Optimizer: Adagrad
__________________
For iter  220
Accuracy  0.015625
Loss  4.171321
Optimizer: Adagrad
__________________
For iter  230
Accuracy  0.03125
Loss  4.1322594
Optimizer: Adagrad
__________________
For iter  240
Accuracy  0.046875
Loss  4.147028
Optimizer: Adagrad
__________________
For iter  250
Accuracy  0.0390625
Loss  4.3244953
Optimizer: Adagrad
__________________
For iter  260
Accuracy  0.0234375
Loss  4.1058283
Optimizer: Adagrad
__________________
For iter  270
Accuracy  0.015625
Loss  4.1936817
Optimizer: Adagrad
__________________
For iter  280
Accuracy  0.0390625
Loss  4.0457892
Optimizer: Adagrad
__________________
For iter  290
Accuracy  0.0078125
Loss  4.1118956
Optimizer: Adagrad
__________________
For iter  300
Accuracy  0.03125
Loss  4.156307
Optimizer: Adagrad
__________________
For iter  310
Accuracy  0.03125
Loss  4.0576096
Optimizer: Adagrad
__________________
For iter  320
Accuracy  0.046875
Loss  4.071598
Optimizer: Adagrad
__________________
For iter  330
Accuracy  0.0390625
Loss  4.1264076
Optimizer: Adagrad
__________________
For iter  340
Accuracy  0.0546875
Loss  4.0655317
Optimizer: Adagrad
__________________
For iter  350
Accuracy  0.046875
Loss  4.0903225
Optimizer: Adagrad
__________________
For iter  360
Accuracy  0.0546875
Loss  4.124211
Optimizer: Adagrad
__________________
For iter  370
Accuracy  0.078125
Loss  4.050544
Optimizer: Adagrad
__________________
For iter  380
Accuracy  0.046875
Loss  4.0304136
Optimizer: Adagrad
__________________
For iter  390
Accuracy  0.0234375
Loss  4.0012345
Optimizer: Adagrad
__________________
For iter  400
Accuracy  0.0390625
Loss  4.050556
Optimizer: Adagrad
__________________
For iter  410
Accuracy  0.0625
Loss  4.0369453
Optimizer: Adagrad
__________________
For iter  420
Accuracy  0.0390625
Loss  4.0784726
Optimizer: Adagrad
__________________
For iter  430
Accuracy  0.078125
Loss  4.0179267
Optimizer: Adagrad
__________________
For iter  440
Accuracy  0.03125
Loss  4.1282024
Optimizer: Adagrad
__________________
For iter  450
Accuracy  0.03125
Loss  3.9525914
Optimizer: Adagrad
__________________
For iter  460
Accuracy  0.0546875
Loss  4.0221505
Optimizer: Adagrad
__________________
For iter  470
Accuracy  0.0625
Loss  3.8932595
Optimizer: Adagrad
__________________
For iter  480
Accuracy  0.0546875
Loss  4.1146665
Optimizer: Adagrad
__________________
For iter  490
Accuracy  0.078125
Loss  4.01224
Optimizer: Adagrad
__________________
For iter  500
Accuracy  0.0546875
Loss  3.9601276
Optimizer: Adagrad
__________________
For iter  510
Accuracy  0.0390625
Loss  3.9546442
Optimizer: Adagrad
__________________
For iter  520
Accuracy  0.0390625
Loss  3.9856906
Optimizer: Adagrad
__________________
For iter  530
Accuracy  0.046875
Loss  3.9823997
Optimizer: Adagrad
__________________
For iter  540
Accuracy  0.078125
Loss  3.9813724
Optimizer: Adagrad
__________________
For iter  550
Accuracy  0.078125
Loss  4.0601807
Optimizer: Adagrad
__________________
For iter  560
Accuracy  0.0703125
Loss  4.033061
Optimizer: Adagrad
__________________
For iter  570
Accuracy  0.09375
Loss  3.815436
Optimizer: Adagrad
__________________
For iter  580
Accuracy  0.046875
Loss  3.8267095
Optimizer: Adagrad
__________________
For iter  590
Accuracy  0.0859375
Loss  3.903316
Optimizer: Adagrad
__________________
For iter  600
Accuracy  0.078125
Loss  4.0130415
Optimizer: Adagrad
__________________
For iter  610
Accuracy  0.0546875
Loss  3.893095
Optimizer: Adagrad
__________________
For iter  620
Accuracy  0.0546875
Loss  3.9864583
Optimizer: Adagrad
__________________
For iter  630
Accuracy  0.0859375
Loss  3.9105835
Optimizer: Adagrad
__________________
For iter  640
Accuracy  0.0703125
Loss  3.9169931
Optimizer: Adagrad
__________________
For iter  650
Accuracy  0.0546875
Loss  3.9592538
Optimizer: Adagrad
__________________
For iter  660
Accuracy  0.078125
Loss  3.8199005
Optimizer: Adagrad
__________________
For iter  670
Accuracy  0.0703125
Loss  4.0094147
Optimizer: Adagrad
__________________
For iter  680
Accuracy  0.0546875
Loss  3.8590376
Optimizer: Adagrad
__________________
For iter  690
Accuracy  0.1015625
Loss  3.8913136
Optimizer: Adagrad
__________________
For iter  700
Accuracy  0.0859375
Loss  3.8836496
Optimizer: Adagrad
__________________
For iter  710
Accuracy  0.0859375
Loss  3.969819
Optimizer: Adagrad
__________________
For iter  720
Accuracy  0.0546875
Loss  4.031148
Optimizer: Adagrad
__________________
For iter  730
Accuracy  0.0390625
Loss  3.9243224
Optimizer: Adagrad
__________________
For iter  740
Accuracy  0.109375
Loss  3.8418412
Optimizer: Adagrad
__________________
For iter  750
Accuracy  0.109375
Loss  3.9293058
Optimizer: Adagrad
__________________
For iter  760
Accuracy  0.0859375
Loss  3.7876852
Optimizer: Adagrad
__________________
For iter  770
Accuracy  0.09375
Loss  3.8654532
Optimizer: Adagrad
__________________
For iter  780
Accuracy  0.078125
Loss  3.9165378
Optimizer: Adagrad
__________________
For iter  790
Accuracy  0.0625
Loss  3.8285933
Optimizer: Adagrad
__________________
Testing Accuracy: 0.07582261

Observation for Hyper Parameter tuning the Optimizer:

The model used before performing hyper parameter tuning had an accuracy of 94.53% and a testing accuracy of 68.59% with the Adam Optimizer. Using the Adagrad optimizer did not help improve the accuracy beyond 10% and the tesing accuracy is consistently oor at 7%. Hence, changing the optimizer plays an important role in the RNN Model and selected the right opitmizer is also significant.

Training Accuracy=10% Testing Accuracy=7%

Hyper parameter tuning the Optimizer for RNN

The optimizer used in this case is the RMSProp Optimizer. Reuse the code from the initla model to load the data and reset the graph.

In [84]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Reuse the code to set the hyper parameter values. Ensure to replace the optimizer with RMSProp Optimizer. Closely see comments to view the change in optimizer

In [85]:
reset_graph()
#define constants
#unrolled through 32 time steps
time_steps=32
#hidden LSTM units
num_units=128
#rows of 32 pixels
n_input=32
#learning rate for adam
learning_rate=0.0001

n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
# change the optimization to RMSProp here
opt=tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Optimizer: RMSProp")
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
For iter  10
Accuracy  0.0
Loss  6.7777185
Optimizer: RMSProp
__________________
For iter  20
Accuracy  0.015625
Loss  6.4341493
Optimizer: RMSProp
__________________
For iter  30
Accuracy  0.0
Loss  6.192767
Optimizer: RMSProp
__________________
For iter  40
Accuracy  0.015625
Loss  5.5503263
Optimizer: RMSProp
__________________
For iter  50
Accuracy  0.0078125
Loss  5.8208075
Optimizer: RMSProp
__________________
For iter  60
Accuracy  0.0
Loss  5.414848
Optimizer: RMSProp
__________________
For iter  70
Accuracy  0.0234375
Loss  5.4313374
Optimizer: RMSProp
__________________
For iter  80
Accuracy  0.03125
Loss  5.024989
Optimizer: RMSProp
__________________
For iter  90
Accuracy  0.015625
Loss  5.0283976
Optimizer: RMSProp
__________________
For iter  100
Accuracy  0.0
Loss  4.806094
Optimizer: RMSProp
__________________
For iter  110
Accuracy  0.0234375
Loss  4.693117
Optimizer: RMSProp
__________________
For iter  120
Accuracy  0.015625
Loss  4.547054
Optimizer: RMSProp
__________________
For iter  130
Accuracy  0.015625
Loss  4.5622697
Optimizer: RMSProp
__________________
For iter  140
Accuracy  0.0546875
Loss  4.309982
Optimizer: RMSProp
__________________
For iter  150
Accuracy  0.0546875
Loss  4.28438
Optimizer: RMSProp
__________________
For iter  160
Accuracy  0.0546875
Loss  4.3418484
Optimizer: RMSProp
__________________
For iter  170
Accuracy  0.0859375
Loss  4.129177
Optimizer: RMSProp
__________________
For iter  180
Accuracy  0.0859375
Loss  4.173422
Optimizer: RMSProp
__________________
For iter  190
Accuracy  0.0859375
Loss  4.0660763
Optimizer: RMSProp
__________________
For iter  200
Accuracy  0.0859375
Loss  4.025037
Optimizer: RMSProp
__________________
For iter  210
Accuracy  0.125
Loss  3.942927
Optimizer: RMSProp
__________________
For iter  220
Accuracy  0.1015625
Loss  3.852069
Optimizer: RMSProp
__________________
For iter  230
Accuracy  0.1171875
Loss  3.938765
Optimizer: RMSProp
__________________
For iter  240
Accuracy  0.1328125
Loss  3.8873734
Optimizer: RMSProp
__________________
For iter  250
Accuracy  0.1171875
Loss  3.8375635
Optimizer: RMSProp
__________________
For iter  260
Accuracy  0.140625
Loss  3.6773849
Optimizer: RMSProp
__________________
For iter  270
Accuracy  0.140625
Loss  3.627279
Optimizer: RMSProp
__________________
For iter  280
Accuracy  0.1015625
Loss  3.7418327
Optimizer: RMSProp
__________________
For iter  290
Accuracy  0.234375
Loss  3.5335393
Optimizer: RMSProp
__________________
For iter  300
Accuracy  0.171875
Loss  3.6320534
Optimizer: RMSProp
__________________
For iter  310
Accuracy  0.125
Loss  3.533934
Optimizer: RMSProp
__________________
For iter  320
Accuracy  0.1796875
Loss  3.4594364
Optimizer: RMSProp
__________________
For iter  330
Accuracy  0.2265625
Loss  3.454009
Optimizer: RMSProp
__________________
For iter  340
Accuracy  0.21875
Loss  3.3141575
Optimizer: RMSProp
__________________
For iter  350
Accuracy  0.203125
Loss  3.3043318
Optimizer: RMSProp
__________________
For iter  360
Accuracy  0.1875
Loss  3.3993862
Optimizer: RMSProp
__________________
For iter  370
Accuracy  0.1796875
Loss  3.3863292
Optimizer: RMSProp
__________________
For iter  380
Accuracy  0.25
Loss  3.2855496
Optimizer: RMSProp
__________________
For iter  390
Accuracy  0.203125
Loss  3.211115
Optimizer: RMSProp
__________________
For iter  400
Accuracy  0.1484375
Loss  3.4754472
Optimizer: RMSProp
__________________
For iter  410
Accuracy  0.1953125
Loss  3.142469
Optimizer: RMSProp
__________________
For iter  420
Accuracy  0.1484375
Loss  3.420929
Optimizer: RMSProp
__________________
For iter  430
Accuracy  0.2265625
Loss  3.1168826
Optimizer: RMSProp
__________________
For iter  440
Accuracy  0.296875
Loss  3.0128922
Optimizer: RMSProp
__________________
For iter  450
Accuracy  0.203125
Loss  3.1756475
Optimizer: RMSProp
__________________
For iter  460
Accuracy  0.265625
Loss  3.247863
Optimizer: RMSProp
__________________
For iter  470
Accuracy  0.2578125
Loss  3.0245748
Optimizer: RMSProp
__________________
For iter  480
Accuracy  0.34375
Loss  2.6804101
Optimizer: RMSProp
__________________
For iter  490
Accuracy  0.25
Loss  2.935113
Optimizer: RMSProp
__________________
For iter  500
Accuracy  0.2734375
Loss  2.9302611
Optimizer: RMSProp
__________________
For iter  510
Accuracy  0.296875
Loss  2.9805741
Optimizer: RMSProp
__________________
For iter  520
Accuracy  0.34375
Loss  2.7288642
Optimizer: RMSProp
__________________
For iter  530
Accuracy  0.25
Loss  2.8693738
Optimizer: RMSProp
__________________
For iter  540
Accuracy  0.2890625
Loss  2.676652
Optimizer: RMSProp
__________________
For iter  550
Accuracy  0.328125
Loss  2.7646594
Optimizer: RMSProp
__________________
For iter  560
Accuracy  0.328125
Loss  2.7227154
Optimizer: RMSProp
__________________
For iter  570
Accuracy  0.34375
Loss  2.7616634
Optimizer: RMSProp
__________________
For iter  580
Accuracy  0.3359375
Loss  2.6452994
Optimizer: RMSProp
__________________
For iter  590
Accuracy  0.375
Loss  2.5551407
Optimizer: RMSProp
__________________
For iter  600
Accuracy  0.296875
Loss  2.9669392
Optimizer: RMSProp
__________________
For iter  610
Accuracy  0.3203125
Loss  2.762218
Optimizer: RMSProp
__________________
For iter  620
Accuracy  0.3359375
Loss  2.6224425
Optimizer: RMSProp
__________________
For iter  630
Accuracy  0.375
Loss  2.7299838
Optimizer: RMSProp
__________________
For iter  640
Accuracy  0.4375
Loss  2.221561
Optimizer: RMSProp
__________________
For iter  650
Accuracy  0.4296875
Loss  2.38839
Optimizer: RMSProp
__________________
For iter  660
Accuracy  0.3359375
Loss  2.3618631
Optimizer: RMSProp
__________________
For iter  670
Accuracy  0.328125
Loss  2.5142317
Optimizer: RMSProp
__________________
For iter  680
Accuracy  0.421875
Loss  2.3249574
Optimizer: RMSProp
__________________
For iter  690
Accuracy  0.5
Loss  2.2209835
Optimizer: RMSProp
__________________
For iter  700
Accuracy  0.3359375
Loss  2.4480562
Optimizer: RMSProp
__________________
For iter  710
Accuracy  0.34375
Loss  2.5618715
Optimizer: RMSProp
__________________
For iter  720
Accuracy  0.484375
Loss  2.1031415
Optimizer: RMSProp
__________________
For iter  730
Accuracy  0.40625
Loss  2.3101864
Optimizer: RMSProp
__________________
For iter  740
Accuracy  0.4296875
Loss  2.3511937
Optimizer: RMSProp
__________________
For iter  750
Accuracy  0.4296875
Loss  2.3885038
Optimizer: RMSProp
__________________
For iter  760
Accuracy  0.3828125
Loss  2.2813978
Optimizer: RMSProp
__________________
For iter  770
Accuracy  0.3515625
Loss  2.3951564
Optimizer: RMSProp
__________________
For iter  780
Accuracy  0.3828125
Loss  2.3622518
Optimizer: RMSProp
__________________
For iter  790
Accuracy  0.421875
Loss  2.1795938
Optimizer: RMSProp
__________________
Testing Accuracy: 0.34620887
In [ ]:
 

Observation for Hyper Parameter Tuning Optimizer

Using the RMSPRop optimizer also did not help much but did improve the accuracy to 50%.

Train Accuracy = 50% Test Accuracy=34.62%

Final observation for Hyper Parameter Tuning Optimizer

Hence, RMSProp optimizer did help but from the above 3 trials it is observed that the Adam Optimizer performed best over Adagrad and RMSProp.

The optimizer is prospective hyper parameter to tune the RNN LSTM model. Though the Adam Optimizer out performed all the other optimizers

Hyper Parameter Tuning Activation Function for RNN

Activation Functions

We tune the following activation function values : Softsign and Relu.

Lets start by tuning with activation function Sof Sign. Reuse the code for initial model for resetting the graph and loading the data.

In [7]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Reuse the code for initializing the hyper parameters from the previous model. Add an additional parameter activation in the Basic LSTM Cell. Observe the comments closely to view the change made.

In [8]:
reset_graph()
#define constants
#unrolled through 32 time steps
time_steps=32
#hidden LSTM units
num_units=128
#rows of 32 pixels
n_input=32
#learning rate for adam
learning_rate=0.001

n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
# add activation softsign . By default value taken is Tanh
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1,activation=tf.nn.softsign)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Activation: Softsign")
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
For iter  10
Accuracy  0.015625
Loss  4.7175884
Activation: Softsign
__________________
For iter  20
Accuracy  0.046875
Loss  4.334697
Activation: Softsign
__________________
For iter  30
Accuracy  0.0625
Loss  4.12254
Activation: Softsign
__________________
For iter  40
Accuracy  0.0546875
Loss  4.0548725
Activation: Softsign
__________________
For iter  50
Accuracy  0.1875
Loss  3.6804206
Activation: Softsign
__________________
For iter  60
Accuracy  0.09375
Loss  3.822148
Activation: Softsign
__________________
For iter  70
Accuracy  0.1328125
Loss  3.4799075
Activation: Softsign
__________________
For iter  80
Accuracy  0.1875
Loss  3.2454371
Activation: Softsign
__________________
For iter  90
Accuracy  0.1640625
Loss  3.2458687
Activation: Softsign
__________________
For iter  100
Accuracy  0.234375
Loss  3.190152
Activation: Softsign
__________________
For iter  110
Accuracy  0.25
Loss  2.7250752
Activation: Softsign
__________________
For iter  120
Accuracy  0.2890625
Loss  2.7306232
Activation: Softsign
__________________
For iter  130
Accuracy  0.34375
Loss  2.6096902
Activation: Softsign
__________________
For iter  140
Accuracy  0.3359375
Loss  2.2568102
Activation: Softsign
__________________
For iter  150
Accuracy  0.4375
Loss  2.125117
Activation: Softsign
__________________
For iter  160
Accuracy  0.4453125
Loss  2.083714
Activation: Softsign
__________________
For iter  170
Accuracy  0.4296875
Loss  2.0802383
Activation: Softsign
__________________
For iter  180
Accuracy  0.5234375
Loss  1.8995483
Activation: Softsign
__________________
For iter  190
Accuracy  0.4296875
Loss  2.0497527
Activation: Softsign
__________________
For iter  200
Accuracy  0.4921875
Loss  1.8546612
Activation: Softsign
__________________
For iter  210
Accuracy  0.609375
Loss  1.5025296
Activation: Softsign
__________________
For iter  220
Accuracy  0.4609375
Loss  1.6449848
Activation: Softsign
__________________
For iter  230
Accuracy  0.609375
Loss  1.5639114
Activation: Softsign
__________________
For iter  240
Accuracy  0.625
Loss  1.4586143
Activation: Softsign
__________________
For iter  250
Accuracy  0.5859375
Loss  1.4563407
Activation: Softsign
__________________
For iter  260
Accuracy  0.6875
Loss  1.1435959
Activation: Softsign
__________________
For iter  270
Accuracy  0.578125
Loss  1.5808682
Activation: Softsign
__________________
For iter  280
Accuracy  0.640625
Loss  1.4233596
Activation: Softsign
__________________
For iter  290
Accuracy  0.5625
Loss  1.5063126
Activation: Softsign
__________________
For iter  300
Accuracy  0.6875
Loss  1.075028
Activation: Softsign
__________________
For iter  310
Accuracy  0.6171875
Loss  1.1015708
Activation: Softsign
__________________
For iter  320
Accuracy  0.6953125
Loss  0.9770915
Activation: Softsign
__________________
For iter  330
Accuracy  0.671875
Loss  1.1336055
Activation: Softsign
__________________
For iter  340
Accuracy  0.7890625
Loss  0.865546
Activation: Softsign
__________________
For iter  350
Accuracy  0.5625
Loss  1.2758231
Activation: Softsign
__________________
For iter  360
Accuracy  0.6484375
Loss  1.1358497
Activation: Softsign
__________________
For iter  370
Accuracy  0.703125
Loss  0.98097026
Activation: Softsign
__________________
For iter  380
Accuracy  0.640625
Loss  1.092413
Activation: Softsign
__________________
For iter  390
Accuracy  0.6640625
Loss  1.1819445
Activation: Softsign
__________________
For iter  400
Accuracy  0.703125
Loss  1.0152752
Activation: Softsign
__________________
For iter  410
Accuracy  0.7265625
Loss  0.9248463
Activation: Softsign
__________________
For iter  420
Accuracy  0.7578125
Loss  0.7921268
Activation: Softsign
__________________
For iter  430
Accuracy  0.75
Loss  0.9514612
Activation: Softsign
__________________
For iter  440
Accuracy  0.78125
Loss  0.76005685
Activation: Softsign
__________________
For iter  450
Accuracy  0.7890625
Loss  0.6531165
Activation: Softsign
__________________
For iter  460
Accuracy  0.78125
Loss  0.79630345
Activation: Softsign
__________________
For iter  470
Accuracy  0.8046875
Loss  0.75520694
Activation: Softsign
__________________
For iter  480
Accuracy  0.78125
Loss  0.8290948
Activation: Softsign
__________________
For iter  490
Accuracy  0.7578125
Loss  0.7324039
Activation: Softsign
__________________
For iter  500
Accuracy  0.7109375
Loss  0.85368395
Activation: Softsign
__________________
For iter  510
Accuracy  0.84375
Loss  0.5965813
Activation: Softsign
__________________
For iter  520
Accuracy  0.8203125
Loss  0.6190038
Activation: Softsign
__________________
For iter  530
Accuracy  0.7890625
Loss  0.72364676
Activation: Softsign
__________________
For iter  540
Accuracy  0.734375
Loss  0.9237298
Activation: Softsign
__________________
For iter  550
Accuracy  0.7734375
Loss  0.7240307
Activation: Softsign
__________________
For iter  560
Accuracy  0.78125
Loss  0.80631864
Activation: Softsign
__________________
For iter  570
Accuracy  0.78125
Loss  0.66878515
Activation: Softsign
__________________
For iter  580
Accuracy  0.7734375
Loss  0.524693
Activation: Softsign
__________________
For iter  590
Accuracy  0.8359375
Loss  0.43537915
Activation: Softsign
__________________
For iter  600
Accuracy  0.8125
Loss  0.63212514
Activation: Softsign
__________________
For iter  610
Accuracy  0.8046875
Loss  0.49453866
Activation: Softsign
__________________
For iter  620
Accuracy  0.84375
Loss  0.57524693
Activation: Softsign
__________________
For iter  630
Accuracy  0.8046875
Loss  0.5607844
Activation: Softsign
__________________
For iter  640
Accuracy  0.8359375
Loss  0.53033423
Activation: Softsign
__________________
For iter  650
Accuracy  0.8359375
Loss  0.5797418
Activation: Softsign
__________________
For iter  660
Accuracy  0.8203125
Loss  0.6178523
Activation: Softsign
__________________
For iter  670
Accuracy  0.84375
Loss  0.48538494
Activation: Softsign
__________________
For iter  680
Accuracy  0.84375
Loss  0.48579124
Activation: Softsign
__________________
For iter  690
Accuracy  0.84375
Loss  0.5097758
Activation: Softsign
__________________
For iter  700
Accuracy  0.765625
Loss  0.59784585
Activation: Softsign
__________________
For iter  710
Accuracy  0.8671875
Loss  0.48035842
Activation: Softsign
__________________
For iter  720
Accuracy  0.8515625
Loss  0.5511935
Activation: Softsign
__________________
For iter  730
Accuracy  0.828125
Loss  0.48220938
Activation: Softsign
__________________
For iter  740
Accuracy  0.8203125
Loss  0.5467706
Activation: Softsign
__________________
For iter  750
Accuracy  0.828125
Loss  0.47274587
Activation: Softsign
__________________
For iter  760
Accuracy  0.875
Loss  0.4333256
Activation: Softsign
__________________
For iter  770
Accuracy  0.8203125
Loss  0.45645633
Activation: Softsign
__________________
For iter  780
Accuracy  0.9296875
Loss  0.27590883
Activation: Softsign
__________________
For iter  790
Accuracy  0.875
Loss  0.39048547
Activation: Softsign
__________________
Testing Accuracy: 0.6881259

Observation for tuning Activation Function for RNN

Teh default activation used when activation is not specified is Tanh. The first model used the default activation of Tanh. Using the Softsign activation function did not improve the accuracy over the existing high accuracy of 94.53% and testing accuracy of 68.59. The Activation function Softsign provides a training accuracy of 92% and a minute increase in testing accuracy of 68.8%. But it would be worth observing the change over more epochs as well

Hyper Parameter Tuning Activation Function for RNN

Next we will tune tthe activation function Relu. Loading the data remains the same as the previous model

In [9]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# Dataset consists of a subset of HasyV2 data.Only the alphanumberic characters
# Code referenced -https://www.kaggle.com/usersumit/alphanumeric-handwritten-dataset/data
# The data subset has been drawn from https://github.com/sumit-kothari/AlphaNum-HASYv2/tree/master/output_data_alpha_num
X_FNAME = "alphanum-hasy-data-X.npy"
Y_FNAME = "alphanum-hasy-data-y.npy"
SYMBOL_FNAME = "symbols.csv"

X_load = np.load(X_FNAME)
y_load = np.load(Y_FNAME)
SYMBOLS = pd.read_csv(SYMBOL_FNAME) 
SYMBOLS = SYMBOLS[["symbol_id", "latex"]]

#This is using the Scikit Learn Library
X_train, X_test, y_train, y_test = train_test_split(X_load, y_load, test_size=0.3)

print("Train dataset shape")
print(X_train.shape, y_train.shape)
print("Test dataset shape")
print(X_test.shape, y_test.shape)
Train dataset shape
(3260, 32, 32) (3260,)
Test dataset shape
(1398, 32, 32) (1398,)

Intialize the hyper Parameter as in the initial model code. Ensure to change the activation function. Closely see the comments to see the change in activation functions

In [10]:
reset_graph()
#define constants
#unrolled through 28 time steps
time_steps=32
#hidden LSTM units
num_units=128
#rows of 28 pixels
n_input=32
#learning rate for adam
learning_rate=0.001
#mnist is meant to be classified in 10 classes(0-9).
n_classes=116
#size of batch
batch_size=128

# Normalize the training set
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]
print("num_classes = ", num_classes)

out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

#defining the network
#change the activation function to relu
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1,activation=tf.nn.relu)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

# Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
# print(Xtr)
# print(Ytr)

train_loss=[]
train_accuracy=[]
epoch_list=[]

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    train_loss=[]
    train_accuracy=[]
    epoch_list=[]
    iter=1
    while iter<800:
        batch_x,batch_y=next_batch(batch_size,X_train,y_train)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            epoch_list.append(iter)
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            train_loss.append(los)
            train_accuracy.append(acc)
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("Activation: Relu")
            print("__________________")
                

        iter=iter+1
    test_data = X_test.reshape((-1, time_steps, n_input))
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: y_test}))
    
    plot_loss_epoch()
    plot_acc_epoch()
num_classes =  116
WARNING:tensorflow:From <ipython-input-10-245da1f07dbe>:47: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

For iter  10
Accuracy  0.0
Loss  5.0527167
Activation: Relu
__________________
For iter  20
Accuracy  0.0234375
Loss  4.9464087
Activation: Relu
__________________
For iter  30
Accuracy  0.0234375
Loss  4.5837727
Activation: Relu
__________________
For iter  40
Accuracy  0.0390625
Loss  4.4056253
Activation: Relu
__________________
For iter  50
Accuracy  0.1171875
Loss  4.416363
Activation: Relu
__________________
For iter  60
Accuracy  0.1015625
Loss  4.1089754
Activation: Relu
__________________
For iter  70
Accuracy  0.125
Loss  4.010038
Activation: Relu
__________________
For iter  80
Accuracy  0.125
Loss  3.995918
Activation: Relu
__________________
For iter  90
Accuracy  0.1953125
Loss  3.463012
Activation: Relu
__________________
For iter  100
Accuracy  0.25
Loss  3.4841132
Activation: Relu
__________________
For iter  110
Accuracy  0.25
Loss  3.3312192
Activation: Relu
__________________
For iter  120
Accuracy  0.265625
Loss  2.948328
Activation: Relu
__________________
For iter  130
Accuracy  0.296875
Loss  2.8267708
Activation: Relu
__________________
For iter  140
Accuracy  0.3203125
Loss  2.5039868
Activation: Relu
__________________
For iter  150
Accuracy  0.2890625
Loss  2.860752
Activation: Relu
__________________
For iter  160
Accuracy  0.40625
Loss  2.427497
Activation: Relu
__________________
For iter  170
Accuracy  0.453125
Loss  1.9621114
Activation: Relu
__________________
For iter  180
Accuracy  0.4609375
Loss  2.1116037
Activation: Relu
__________________
For iter  190
Accuracy  0.3984375
Loss  2.150799
Activation: Relu
__________________
For iter  200
Accuracy  0.3828125
Loss  2.2380667
Activation: Relu
__________________
For iter  210
Accuracy  0.4375
Loss  1.9459285
Activation: Relu
__________________
For iter  220
Accuracy  0.4453125
Loss  2.1056619
Activation: Relu
__________________
For iter  230
Accuracy  0.5546875
Loss  1.6509309
Activation: Relu
__________________
For iter  240
Accuracy  0.5859375
Loss  1.4902887
Activation: Relu
__________________
For iter  250
Accuracy  0.5390625
Loss  1.7447681
Activation: Relu
__________________
For iter  260
Accuracy  0.65625
Loss  1.3180416
Activation: Relu
__________________
For iter  270
Accuracy  0.578125
Loss  1.4986835
Activation: Relu
__________________
For iter  280
Accuracy  0.6875
Loss  1.0974398
Activation: Relu
__________________
For iter  290
Accuracy  0.703125
Loss  1.127217
Activation: Relu
__________________
For iter  300
Accuracy  0.6484375
Loss  1.1635102
Activation: Relu
__________________
For iter  310
Accuracy  0.625
Loss  1.3602376
Activation: Relu
__________________
For iter  320
Accuracy  0.703125
Loss  1.1693549
Activation: Relu
__________________
For iter  330
Accuracy  0.7109375
Loss  0.9166385
Activation: Relu
__________________
For iter  340
Accuracy  0.6640625
Loss  1.2525232
Activation: Relu
__________________
For iter  350
Accuracy  0.7109375
Loss  1.0472945
Activation: Relu
__________________
For iter  360
Accuracy  0.703125
Loss  1.0128009
Activation: Relu
__________________
For iter  370
Accuracy  0.7578125
Loss  0.8694738
Activation: Relu
__________________
For iter  380
Accuracy  0.796875
Loss  0.7471708
Activation: Relu
__________________
For iter  390
Accuracy  0.7265625
Loss  0.96970725
Activation: Relu
__________________
For iter  400
Accuracy  0.7421875
Loss  0.95777863
Activation: Relu
__________________
For iter  410
Accuracy  0.765625
Loss  0.6868167
Activation: Relu
__________________
For iter  420
Accuracy  0.7578125
Loss  0.86648977
Activation: Relu
__________________
For iter  430
Accuracy  0.6953125
Loss  0.91503376
Activation: Relu
__________________
For iter  440
Accuracy  0.7734375
Loss  0.7849251
Activation: Relu
__________________
For iter  450
Accuracy  0.7890625
Loss  0.746532
Activation: Relu
__________________
For iter  460
Accuracy  0.796875
Loss  0.77260596
Activation: Relu
__________________
For iter  470
Accuracy  0.7734375
Loss  0.68501973
Activation: Relu
__________________
For iter  480
Accuracy  0.8125
Loss  0.6105933
Activation: Relu
__________________
For iter  490
Accuracy  0.7890625
Loss  0.7383681
Activation: Relu
__________________
For iter  500
Accuracy  0.7578125
Loss  0.7254371
Activation: Relu
__________________
For iter  510
Accuracy  0.8359375
Loss  0.5173213
Activation: Relu
__________________
For iter  520
Accuracy  0.7890625
Loss  0.6807629
Activation: Relu
__________________
For iter  530
Accuracy  0.7578125
Loss  0.7322081
Activation: Relu
__________________
For iter  540
Accuracy  0.7421875
Loss  0.64751244
Activation: Relu
__________________
For iter  550
Accuracy  0.7890625
Loss  0.79038155
Activation: Relu
__________________
For iter  560
Accuracy  0.828125
Loss  0.5899957
Activation: Relu
__________________
For iter  570
Accuracy  0.8125
Loss  0.69123256
Activation: Relu
__________________
For iter  580
Accuracy  0.8203125
Loss  0.486332
Activation: Relu
__________________
For iter  590
Accuracy  0.8046875
Loss  0.61068976
Activation: Relu
__________________
For iter  600
Accuracy  0.8359375
Loss  0.5790945
Activation: Relu
__________________
For iter  610
Accuracy  0.859375
Loss  0.5063443
Activation: Relu
__________________
For iter  620
Accuracy  0.875
Loss  0.4879489
Activation: Relu
__________________
For iter  630
Accuracy  0.875
Loss  0.3823954
Activation: Relu
__________________
For iter  640
Accuracy  0.8125
Loss  0.63358545
Activation: Relu
__________________
For iter  650
Accuracy  0.828125
Loss  0.5577567
Activation: Relu
__________________
For iter  660
Accuracy  0.8125
Loss  0.5313163
Activation: Relu
__________________
For iter  670
Accuracy  0.828125
Loss  0.5578128
Activation: Relu
__________________
For iter  680
Accuracy  0.8828125
Loss  0.33814424
Activation: Relu
__________________
For iter  690
Accuracy  0.796875
Loss  0.5178468
Activation: Relu
__________________
For iter  700
Accuracy  0.8984375
Loss  0.41072273
Activation: Relu
__________________
For iter  710
Accuracy  0.859375
Loss  0.41923547
Activation: Relu
__________________
For iter  720
Accuracy  0.8828125
Loss  0.36245415
Activation: Relu
__________________
For iter  730
Accuracy  0.8828125
Loss  0.3780375
Activation: Relu
__________________
For iter  740
Accuracy  0.8671875
Loss  0.32713825
Activation: Relu
__________________
For iter  750
Accuracy  0.84375
Loss  0.37612647
Activation: Relu
__________________
For iter  760
Accuracy  0.90625
Loss  0.4959477
Activation: Relu
__________________
For iter  770
Accuracy  0.8828125
Loss  0.3356775
Activation: Relu
__________________
For iter  780
Accuracy  0.90625
Loss  0.3029647
Activation: Relu
__________________
For iter  790
Accuracy  0.890625
Loss  0.35482574
Activation: Relu
__________________
Testing Accuracy: 0.6773963

Observation for Hyper Parameter tuning Activation Function:

Using the Relu activation function did not have much effect on the accuracy. As the accuracy decreased by changing the activation to Relu. But it is prospective to try various activation functions as they seem to have an effect on the accuracy and performance of the network.

Training Accuracy - 89% Testing Accuracy -67.7%

Final Observation for Hyper Parameter Tuning Activation Function

It is observed that Tanh performed the best. But it would be interesting to note the change in performance for larger number of epochs for Adagrad optimizer.

Hence Activation Functions must definitely be a prospective hyper parameter tune the RNN LSTM model with preferable values of Tanh or Adagrad.

Final Results for Hyper Parameter Tuning the RNN

Parameters to tune for an RNN are as follows :

  1. Number of epochs of 2000 performed the best. This a strong candidate for improving the performance of the RNN Model
  2. Number of Neurons : Increase in the number of neurons improved the eprformance of the Neural Network. Decreaasing the count did not help. Hence , Number of Neurons as 512 performed the best.
  3. Batch Size : Decreasing the batch size did not help the performance of the Neural Network , but a batch size of 512 produced the best results
  4. Activation : Activation function Tanh performed the best. But even theActivation function Softsign is a prospective candidate among the three compared
  5. Optimizers : The Adam optimizer was the clear winner in comparison to Adagrad and RMSPRop Optimizer

The prospective parameter to tune for an RNN LSTM model are Number of Epochs, Learning rate, Batch size, Number of neurons, optimizers and activation functions. Though the combination of Learning Rate and number of neurons did notimprove performance of the model and can be optinally tuned.

Summary

image.png

Overview of RBM

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. These widely used in deep belief networks . They are popularly used for classification problems as well

Dataset Description

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. Each image as dimensions of28X28 images.The dataset consists of pair, “handwritten digit image” and “label”. Digit ranges from 0 to 9, meaning 10 patterns in total.

handwritten digit image: This is gray scale image with size 28 x 28 pixel. label : This is actual digit number this handwritten digit image represents. It is either 0 to 9.

This is popular dataset set used in Datascience and a popular Hello World Dataset

Dataset Download and Use

We will be using the MNIST Datset from the Tensorflow Library. This will be automatically used from the tensorflow code. No pre- requisites are present.

image.png

Install library xRBM

pip install xrbm

You can also refer to the below website for more details

https://github.com/omimo/xRBM

Structure of the RBM Used

The number of the visible units of the RBM equals to the number of the dimensions of the training data. Number of visible units selected 784 which is the total dimension of the input data. The number of hidden units are 200. The batch size selected is 128 and number of epochs is 100 for the first trial. This uses the Contrastive Divergence with K Gibbs Samples.

Let see the Tensorflow code for the RBM

Restricted Boltzmann Machine Neural network

Tensorflow Code for Initial model of RBM

Lets see the tensorflow code for RBM in steps.

Lets import the necessary libraries first. After which we download the data from the tensorflow libraray automatically. Following which we will initialize the hyper parameters. Next we will define placeholders, weights and bias and apply contrastive divergence with gibb sampling. Lets see how the code looks below

In [1]:
import numpy as np
import tensorflow as tf

%matplotlib inline
import matplotlib.pyplot as plt
from IPython import display

#Uncomment the below lines if you didn't install xRBM using pip and want to use the local code instead 
#import sys
#sys.path.append('../')
C:\Users\jaini\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
In [2]:
import xrbm.models
import xrbm.train
import xrbm.losses
from xrbm.utils.vizutils import *

Downloading the data from the tensorflow libraray and extracting the files

In [3]:
from tensorflow.examples.tutorials.mnist import input_data

data_sets = input_data.read_data_sets('MNIST_data', False)
training_data = data_sets.train.images
Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz

As mentioned above the num_vis is number of visible layers which is the total dimension of the image. In this case that is 784. We initliaze number of epochs, number of hidden layers, batch size as well

In [18]:
num_vis         = training_data[0].shape[0] #=784
num_hid         = 200
learning_rate   = 0.1
batch_size      = 100
training_epochs = 100

We reset the graph inorder to initilaize all variables. We will then apply the hyper parameters to the RBM Model.

In [19]:
# Let's reset the tensorflow graph in case we want to rerun the code
tf.reset_default_graph()
# apply the parameters to the RBM Model. The syntax is specific to the xRBM Library
rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, name='rbm_mnist')

Next we create mini batches

In [20]:
batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size

We create a placeholder for the mini-batch data during training.

We use the CD-k algorithm for training the RBM. For this, we create an instance of the CDApproximator from the xrbm.train module and pass the learning rate to it.

We then define our training op using the CDApproximator's train method, passing the RBM model and the placeholder for the data.

In order to monitor the training process, we calculate the reconstruction cost of the model at each epoch, using the rec_cost_op.

The CD-k algorithm is contrastive divergence used for approximation. For every input it starts a markov chain.

batch_data = tf.placeholder(tf.float32, shape=(None, num_vis))

cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate) train_op = cdapproximator.train(rbm, vis_data=batch_data)

reconstructeddata,,, = rbm.gibbs_sample_vhv(batch_data) xentropy_rec_cost = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

LEts create the Tensorflow session and run the results . Observe the commenets in the code closely for greater details. We will be using the Reconstruction cost as a measure and the visual representation of the data for evaluation
In [22]:
# Create figure first so that we use the same one to draw the filters on during the training
# initlaize lists in order to strore the reconstruction cost and epochs
recon_cost=[]
epoch_list=[]
fig = plt.figure(figsize=(12,8))

with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
            
          # compute the reconstruction cost  
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})

        #Print the reconstruction cost
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%(epoch, training_epochs, reconstruction_cost))
        
        print(title)
     #plot the final image   
    W = rbm.W.eval().transpose()
    filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 20), grid_gap=(1,1))    
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)

Intial observation of the RBM Model:

This was run for 100 epochs. The genration of the images and running the algorithm without additional computation takes extremely long. The reconstruction cost for 100 epochs is -524.17, which is high indicating the model must be run for a longer period of time to reduce the reconstruction cost.

Let us try and see if we can evaluate hyper parameters that can reduce the reconstruction cost.

Hyperparameter Tuning using RBM

We will tune the RBM for activation, regularization, net work initalization, number of hidden layers, regularization and activation.

Since, training the RBM takes extremely long we are going to train it only for 50 epochs. Lets strat with learning rate values 0.01 and 0.00001

Hyper Parameter tuning Learning rate for RBM

We willselect learning rate of 0.01 first. Thecode for initializing the hyper parameter values and running the tensorflow session will remain the same . It is depicted as below. Lets create a function to plot the reconstruction cost vs epochs

In [5]:
# Plot the reconstruction cost vs epochs
def plot_loss_epoch():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Cost vs Epoch', fontsize=15)
    plt.plot(epoch_list, cost, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Reconstruction Cost')

Reuse the code from the initla model to tinitialize the hyper parameter values, create the mini batch and run the tensorflow session. Only edit must be made to the learning rate. The learning rate must be modified to 0.01 . See the comments closely for details of change

In [17]:
# reset graph function
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

    
tf.reset_default_graph()

cost=[]
epoch_list=[]
# change learning rate to 0.01
num_vis         = training_data[0].shape[0] #=784
num_hid         = 200
learning_rate   = 0.01
batch_size      = 100
training_epochs = 50


rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, name='rbm_mnist')

batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size

batch_data     = tf.placeholder(tf.float32, shape=(None, num_vis))

cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate)
train_op       = cdapproximator.train(rbm, vis_data=batch_data)

reconstructed_data,_,_,_ = rbm.gibbs_sample_vhv(batch_data)
xentropy_rec_cost  = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

# Create figure first so that we use the same one to draw the filters on during the training
fig = plt.figure(figsize=(12,8))

with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})
        

        W = rbm.W.eval().transpose()
        filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 20), grid_gap=(1,1))
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%
                (epoch, training_epochs, reconstruction_cost))
        cost.append(reconstruction_cost)
        epoch_list.append(epoch)
        print(title)
        
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)
    plot_loss_epoch()
    
    

Observation for Hyper Parameter Tuning for RBM:

Learning rate of 0.01 has been effective after observing the clarity of image in comparison to a learning rate of 0.1. Hence, Learning rate plays an importatnt role in improving the performace of the RBM . Though the reconstruction cost is yet high but we must also consider that this has been for trained for 50 epochs.

Reconstruction Cost = -528.190

Hyper Parameter Tuning RBM

Lets run the model for hyper parameter learning rate of 0.00001. The code to load the data, initla the hyper parameters and run the tensor flow session remains the same. Only change is the learning rate is set to 0.00001. Observe the change with comments

In [16]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

    
tf.reset_default_graph()

cost=[]
epoch_list=[]
# set learning rate to 0.00001
num_vis         = training_data[0].shape[0] #=784
num_hid         = 200
learning_rate   = 0.00001
batch_size      = 100
training_epochs = 50


rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, name='rbm_mnist')
# create mini batches
batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size
# create placeholder
batch_data     = tf.placeholder(tf.float32, shape=(None, num_vis))

cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate)
train_op       = cdapproximator.train(rbm, vis_data=batch_data)

reconstructed_data,_,_,_ = rbm.gibbs_sample_vhv(batch_data)
xentropy_rec_cost  = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

# Create figure first so that we use the same one to draw the filters on during the training
fig = plt.figure(figsize=(12,8))

#run the tensorflow session
with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})
        

        W = rbm.W.eval().transpose()
        filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 20), grid_gap=(1,1))
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%
                (epoch, training_epochs, reconstruction_cost))
        print(title)
        cost.append(reconstruction_cost)
        epoch_list.append(epoch)
        
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)
    plot_loss_epoch()
    
    

Observation for Hyper Parameter Tuning the RBM

The learning rate of 0.00001 increased the reconstruction cost and reduced the clarity of the image as well

reconstruction Cost = -600.41

FInal Observation for Learning Rate for RBM

Learning rate defintely has an effect on the model. But optimal learning rate as per the trials indiacate 0.01 and 0.1 only. Hence, this is a prospective parameter to tune for the RBM

Hyper Parameter Tuning the hidden layers for RBM

Lets tune the hidden layers for the RBM. Number of Hidden Layers selected are 500 and 100

Let start with Hidden Layers 500. The inital code of the model remains the same. Only change would num_hid =500 and change grid size to (10,50). Watch the comments to see the change

In [27]:
# reset the graph
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

    
tf.reset_default_graph()

cost=[]
epoch_list=[]

# change num_hid to 0.1
num_vis         = training_data[0].shape[0] #=784
num_hid         = 500
learning_rate   = 0.1
batch_size      = 100
training_epochs = 50

# set the model
rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, name='rbm_mnist')

batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size

#create the placeholder
batch_data     = tf.placeholder(tf.float32, shape=(None, num_vis))

cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate)
train_op       = cdapproximator.train(rbm, vis_data=batch_data)

reconstructed_data,_,_,_ = rbm.gibbs_sample_vhv(batch_data)
xentropy_rec_cost  = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

# Create figure first so that we use the same one to draw the filters on during the training
fig = plt.figure(figsize=(12,8))

with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})
        

        W = rbm.W.eval().transpose()
        # change the grid size dimension to (10,50)
        filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 50), grid_gap=(1,1))
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%
                (epoch, training_epochs, reconstruction_cost))
        cost.append(reconstruction_cost)
        epoch_list.append(epoch)
        print(title)
        
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)
    plot_loss_epoch()
    
    

Observation for Number of Hidden Layers for RBM

We observe that the images became extremely clear in comparison to the bench mark of th first model. More over the reconstruction cost also reduced drastically. This imples that number of hidden layers plays an important role in tuning

Hyper Parameter Tuning for learning rate

Lets set number of hidden layers to 100 observe the change in performanc. we will reuse the code of the initial model and just change the number of hidden layers to 100. See the comments closely for changes

In [28]:
# reset graph
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

    
tf.reset_default_graph()

cost=[]
epoch_list=[]
# set the hidden layers to 100
num_vis         = training_data[0].shape[0] #=784
num_hid         = 100
learning_rate   = 0.1
batch_size      = 100
training_epochs = 50

# apply hyper parameters to the model
rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, name='rbm_mnist')

batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size

# placeholder introduced
batch_data     = tf.placeholder(tf.float32, shape=(None, num_vis))

cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate)
train_op       = cdapproximator.train(rbm, vis_data=batch_data)

reconstructed_data,_,_,_ = rbm.gibbs_sample_vhv(batch_data)
xentropy_rec_cost  = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

# Create figure first so that we use the same one to draw the filters on during the training
fig = plt.figure(figsize=(12,8))

with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})
        

        W = rbm.W.eval().transpose()
#Change the grid size to (10,10b)
        filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 10), grid_gap=(1,1))
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%
                (epoch, training_epochs, reconstruction_cost))
        cost.append(reconstruction_cost)
        epoch_list.append(epoch)
        print(title)
        
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)
    plot_loss_epoch()
    
    

Observation for Hyper Parameter tuning Hidden Layers for the RBM

we observe that the loss increased when number of hidden layers decresed . The images are also not very clear. Hence, hidden layers as 500 was the most significant.

Final result for Hyper Parameter Tuning the Hidden Layers

It is clearly visible that the reconstruction cost decreased when number of hidden layers increased. Hence, number of hidden layers is a prospective parameter to tune for RBM ith larger number of hidden layers the better

Hyper Parameter Tuning Network Initialization for RBM

Now we will tune the network initialization . We will set a network initialization of to Xavier Intialization and notice if the reconstruction cost decreases or the image get clearer. The code remains the same from the initial model to load the data, initialize the hyper parameters and run the tensorflow session. The only change is that we need to add xavier Initlization as a parameter to the model. Observe the comments below to see the change.

In [7]:
# reset graph
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

    
tf.reset_default_graph()
# list to plot store epochs and cost
cost=[]
epoch_list=[]

num_vis         = training_data[0].shape[0] #=784
num_hid         = 100
learning_rate   = 0.1
batch_size      = 100
training_epochs = 50

## Here , set the initialization to Xavier Initialization as parameter to the model
rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, initializer=tf.contrib.layers.xavier_initializer(),name='rbm_mnist')

batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size

batch_data     = tf.placeholder(tf.float32, shape=(None, num_vis))

cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate)
train_op       = cdapproximator.train(rbm, vis_data=batch_data)

reconstructed_data,_,_,_ = rbm.gibbs_sample_vhv(batch_data)
xentropy_rec_cost  = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

# Create figure first so that we use the same one to draw the filters on during the training
fig = plt.figure(figsize=(12,8))

with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})
        

        W = rbm.W.eval().transpose()
        filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 10), grid_gap=(1,1))
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%
                (epoch, training_epochs, reconstruction_cost))
        cost.append(reconstruction_cost)
        epoch_list.append(epoch)
        print(title)
 # plot th figure   
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)
    plot_loss_epoch()
    
    

Observation for Hyper Parameter Tuning Network Initialization for RBM

It is observed that the reconstruction cost increases but the cost is also consistently decreasing for the model. Hence, we could consider the net work initialization as a prospective parameter to tune , even though it did not give very good results.

Reconstruction Cost=-527.42

Final result for Hyper Parameter Tuning Network Initialization for RBM

Even though the network initialization did not reduce the reconstruction cost, from the graph it can be observed that the cost is decreasing consistently with the number of epochs. Hence, it would be interesting to not if it had an impact as the number of epochs increase . Hence, I will conclude that Network Initialization is a prospective parameter to tune for the model.

Hyper Parameter Tuning for Regularization for RBM

Lets observe the impact of regularization on the RBM Network. We will also initialize the activation to sigmoid and retain the Xavier initialization . The code remains the same as the intial model. Only change is adding the reularization and the activation to the model. It has been done as mentioned below. Lok at the comments closely to observe changes.

In [8]:
#reset graph
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

    
tf.reset_default_graph()

cost=[]
epoch_list=[]

#set hyper parameters

num_vis         = training_data[0].shape[0] #=784
num_hid         = 100
learning_rate   = 0.1
batch_size      = 100
training_epochs = 50

#define model
#set the initializer as xavier initializer ad activation as sigmoid
rbm = xrbm.models.RBM(num_vis=num_vis, num_hid=num_hid, initializer=tf.contrib.layers.xavier_initializer(),activation=tf.nn.sigmoid,name='rbm_mnist')

batch_idxs = np.random.permutation(range(len(training_data)))
n_batches  = len(batch_idxs) // batch_size

batch_data     = tf.placeholder(tf.float32, shape=(None, num_vis))

# set the regularizer here as L1 
cdapproximator = xrbm.train.CDApproximator(learning_rate=learning_rate,regularizer=tf.contrib.layers.l1_regularizer(0.001))
train_op       = cdapproximator.train(rbm, vis_data=batch_data)

reconstructed_data,_,_,_ = rbm.gibbs_sample_vhv(batch_data)
xentropy_rec_cost  = xrbm.losses.cross_entropy(batch_data, reconstructed_data)

# Create figure first so that we use the same one to draw the filters on during the training
fig = plt.figure(figsize=(12,8))

 # run the tensorflow session
with tf.Session() as sess:    
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        for batch_i in range(n_batches):
            # Get just minibatch amount of data
            idxs_i = batch_idxs[batch_i * batch_size:(batch_i + 1) * batch_size]
            
            # Run the training step
            sess.run(train_op, feed_dict={batch_data: training_data[idxs_i]})
    
        reconstruction_cost = sess.run(xentropy_rec_cost, feed_dict={batch_data: training_data})
        

        W = rbm.W.eval().transpose()
        filters_grid = create_2d_filters_grid(W, filter_shape=(28,28), grid_size=(10, 10), grid_gap=(1,1))
        
        title = ('Epoch %i / %i | Reconstruction Cost = %f'%
                (epoch, training_epochs, reconstruction_cost))
        cost.append(reconstruction_cost)
        epoch_list.append(epoch)
        print(title)
#plot the graphs    
    plt.title(title)
    plt.imshow(filters_grid, cmap='gray')
    display.clear_output(wait=True)
    display.display(fig)
    plot_loss_epoch()
    
    

Observation:

The image is extremly clear displaying that regularization does play an imporatnt role for an RBM. Though the reconstruction cost is hight but there is a steady decrease in the cost over number of epochs. The combination of xavier initialization and activation isalso contributing to the clarity of the image.

As future scope it would be interesting to note the indiviual effects of each of the parameters on the RBM Network

Reconstruction Cost = -528.27

Final Observation for Regularization

The reconstruction cost is very high the image is also clear. Hence, it would be imporatnt to know the individual effects of each of the parameters on the RBM like activation. As of the current result I will consider Regularization as a prospective parameter to tune for the RBM

Final results of Hyper Parameter for RBM

  1. As it was depicted earlier learning rate of 0.1 and 0.01 were ideal for the performance of the RBM . Decreasing the learning rate decreased the performance of the mdoel as well. Hence learning rate is a prospective parameter

  2. Increase in the number of hidden layers decreased the reconstruction cost drastically. Hence number of hidden layers as 512 is prospective candidate

  3. Though Network Initialization did provide decrease in cost but due to steady decrease in cost this can be consiserd

  4. Regularization is an imporatant arameter as well. Though it ould be interesting to note the individual effect of activation on the network and regularization , the L1 regularization is a prospective candidate as well.

Summary image.png

General Adversarial Networks

Overview of GANs

Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics (though in tests people can tell real from generated in many cases).

Gans consist of a Generator and Discriminator. In essence the generates the images that are provided as inuts and the discriminator tries to identify the diffrences in the generated images from the actual images. The generator tries to increas the loss where as the discriminator tries to minimize the loss. The ideal case is when there are least diffrences between the images generated and the actuall images. Hence, the ideal scenario is when the discriminator loss is least.

Dataset Description

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. Each image as dimensions of28X28 images.The dataset consists of pair, “handwritten digit image” and “label”. Digit ranges from 0 to 9, meaning 10 patterns in total.

handwritten digit image: This is gray scale image with size 28 x 28 pixel. label : This is actual digit number this handwritten digit image represents. It is either 0 to 9.

This is popular dataset set used in Datascience and a popular Hello World Dataset

Dataset Download and Use

We will be using the MNIST Datset from the Tensorflow Library. This will be automatically used from the tensorflow code. No pre- requisites are present.

!image.png

Structure of the GAN used

This network uses the Xavier Initialization for weights and biases. The Generator and Discriminator use the relu activation function, with one hidden layer. It uses the solver Adam. The probability distribution is identified using the activation function sigmoid. The network was initialized with the 10000 epochs.

Tensorflow code for GAN

The ocde for tensor flow consists of the sasme as the previous. We will initalize place holders, weighst and bias , hyper parameter and start the Tensorflow session . In this case it would be dine for generator and Discriminatir. Lets walk through the code now.

Import necessary libraries as below

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
C:\Users\jaini\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Next, we initialize the network with xavier initialization

In [2]:
def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)

Assign weights and bias to the generator and the discriminator as shown below

In [3]:
# weights and bias for the discriminator
X = tf.placeholder(tf.float32, shape=[None, 784])

D_W1 = tf.Variable(xavier_init([784, 128]))
D_b1 = tf.Variable(tf.zeros(shape=[128]))

D_W2 = tf.Variable(xavier_init([128, 1]))
D_b2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [D_W1, D_W2, D_b1, D_b2]

#Weights and bias for the generator
Z = tf.placeholder(tf.float32, shape=[None, 100])

G_W1 = tf.Variable(xavier_init([100, 128]))
G_b1 = tf.Variable(tf.zeros(shape=[128]))

G_W2 = tf.Variable(xavier_init([128, 784]))
G_b2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [G_W1, G_W2, G_b1, G_b2]

# function to create a random sample
def sample_Z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])

Defining th layers for the generator and discriminator and the generator. We are giving activation function of sigmoid. Matmul operation is used to generate the images as matrix multiplication

In [4]:
def generator(z):
    G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.sigmoid(G_log_prob)

    return G_prob
# The discriminator(x) takes MNIST image(s) and return a scalar which represents a probability of real MNIST image.
def discriminator(x):
    D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.sigmoid(D_logit)

    return D_prob, D_logit

Function to plot images afetr training is defimned next

In [5]:
# Plot images

def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig

Next, we create functions to plot the loss vs epoch for the generator and discriminator

In [7]:
def plot_loss_epoch_generator():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Train Loss vs Epoch', fontsize=15)
    plt.plot(epoch_list, g_train_loss, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Train Loss')
    
def plot_loss_epoch_discriminator():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Train Loss vs Epoch', fontsize=15)
    plt.plot(epoch_list, d_train_loss, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Train Loss')

Evaluating the sample and storing values of the discriminator and generator

In [6]:
G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

Defining the cost function

In [7]:
# D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
# D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))

Defining the the optimizer for the genartor and discriminator

In [8]:
D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)

mb_size = 128
Z_dim = 100

Lets read the data that is downloaded automatically

In [9]:
mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)
Extracting ../../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data\t10k-labels-idx1-ubyte.gz

Start the tensroflow session. Images generated will be stored ina folder out that will be automatically created. There is no absolute path. The name of the folder will be out

In [10]:
#empty lists are initialized
g_train_loss=[]
d_train_loss=[]
epoch_list=[]
sess = tf.Session()
sess.run(tf.global_variables_initializer())

#create a folder if it does not exist
if not os.path.exists('out/'):
    os.makedirs('out/')

i = 0

#start training. stote the images generated in the out folder
for it in range(10000):
    if it % 100 == 0:
        samples = sess.run(G_sample, feed_dict={Z: sample_Z(16, Z_dim)})

        fig = plot(samples)
        plt.savefig('out/{}.png'.format(str(i).zfill(3)), bbox_inches='tight')
        i += 1
        plt.close(fig)

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

    if it % 1000 == 0:
#         epoch_list.append(it)
        print('Iter: {}'.format(it))
        print('D loss: {:.4}'. format(D_loss_curr))
        d_train_loss.append(D_loss_curr)
        print('G_loss: {:.4}'.format(G_loss_curr))
        d_train_loss.append(G_loss_curr)
        print()
#         plot_loss_epoch_generator()
#         plot_loss_epoch_discriminator()
Iter: 0
D loss: 1.313
G_loss: 2.847

Iter: 1000
D loss: 0.009645
G_loss: 10.16

Iter: 2000
D loss: 0.01006
G_loss: 6.534

Iter: 3000
D loss: 0.07838
G_loss: 7.171

Iter: 4000
D loss: 0.1197
G_loss: 5.155

Iter: 5000
D loss: 0.1275
G_loss: 5.418

Iter: 6000
D loss: 0.3744
G_loss: 4.084

Iter: 7000
D loss: 0.3194
G_loss: 4.489

Iter: 8000
D loss: 0.468
G_loss: 3.141

Iter: 9000
D loss: 0.6848
G_loss: 3.047

Observation of the initial model for GAN:

It is observed that in the beginning the generator loss is low and the discriminator loss is less. This is because initially the genrator is lerning the image and in this process the discriminator willfind alot of diffrences between the real image and the fake images. As the number of epochs increases the discrimator loss decreases and the generato loss increases. Below is the snapshot of the first image and the last image generated.

The final losses are as follows:

D loss: 0.6848 G_loss: 3.047

The first image

000.png

The Last image

097.png

The images will be present in the out folder that is automatically created in the path

Hyper parameter tuning GAN

Lets perform hyper parameter tuning for the GAN. Lets start by tuning the activation function.

Activation - Relu

Next , we are going to observe the performance when the activation function is Relu. The initial model code remains the same. Only the activation function for the generator and discriminator is chaged to Relu. Observe the change in the code. The images will be saved in out_activation_relu folder created automatically

In [29]:
def plot_loss_epoch_generator():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Generator Train Loss vs Epoch', fontsize=15)
    plt.plot(epoch_list, g_train_loss, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Generator Train Loss')
    
def plot_loss_epoch_discriminator():
    plt.figure(figsize=(18, 5))
    plt.subplot(1, 2, 1)
    plt.title('Discriminator Train Loss vs Epoch', fontsize=15)
    plt.plot(epoch_list, d_train_loss, 'r-')
    plt.xlabel('Epoch')
    plt.ylabel('Discriminator Train Loss')
In [23]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as npd_train_loss
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
%matplotlib inline

epoch_list=[]
d_train_loss=[]
g_train_loss=[]

def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)


X = tf.placeholder(tf.float32, shape=[None, 784])

D_W1 = tf.Variable(xavier_init([784, 128]))
D_b1 = tf.Variable(tf.zeros(shape=[128]))

D_W2 = tf.Variable(xavier_init([128, 1]))
D_b2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [D_W1, D_W2, D_b1, D_b2]


Z = tf.placeholder(tf.float32, shape=[None, 100])

G_W1 = tf.Variable(xavier_init([100, 128]))
G_b1 = tf.Variable(tf.zeros(shape=[128]))

G_W2 = tf.Variable(xavier_init([128, 784]))
G_b2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [G_W1, G_W2, G_b1, G_b2]


def sample_Z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])


def generator(z):
    G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.relu(G_log_prob)

    return G_prob

#change activation to relu
def discriminator(x):
    D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.relu(D_logit)

    return D_prob, D_logit

#change activation to relu
def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig


G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

# D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
# G_loss = -tf.reduce_mean(tf.log(D_fake))

# Alternative losses:
# -------------------
D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = D_loss_real + D_loss_fake
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))

D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)
#minibatch size
mb_size = 128
Z_dim = 100

r#reading the dataset from tensorflo
mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

#creates a floder if it does not exist
if not os.path.exists('out_activation_relu/'):
    os.makedirs('out_activation_relu/')

i = 0

for it in range(10000):
    if it % 1000 == 0:
        samples = sess.run(G_sample, feed_dict={Z: sample_Z(16, Z_dim)})

        fig = plot(samples)
        plt.savefig('out_activation_relu/{}.png'.format(str(i).zfill(3)), bbox_inches='tight')
        i += 1
        plt.close(fig)

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

    if it % 100 == 0:
        print('Iter: {}'.format(it))
        print('D loss: {:.4}'. format(D_loss_curr))
        print('G_loss: {:.4}'.format(G_loss_curr))
        print()
        d_train_loss.append(D_loss_curr)
        g_train_loss.append(G_loss_curr)
        epoch_list.append(it)
plot_loss_epoch_generator()
plot_loss_epoch_discriminator()
Extracting ../../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data\t10k-labels-idx1-ubyte.gz
Iter: 0
D loss: 1.336
G_loss: 1.685

Iter: 100
D loss: 0.3314
G_loss: 9.206

Iter: 200
D loss: 0.09721
G_loss: 8.755

Iter: 300
D loss: 0.02709
G_loss: 6.324

Iter: 400
D loss: 0.0954
G_loss: 3.609

Iter: 500
D loss: 0.2162
G_loss: 1.915

Iter: 600
D loss: 0.1441
G_loss: 2.647

Iter: 700
D loss: 0.2246
G_loss: 4.48

Iter: 800
D loss: 0.06432
G_loss: 4.318

Iter: 900
D loss: 0.4801
G_loss: 3.234

Iter: 1000
D loss: 0.2009
G_loss: 3.471

Iter: 1100
D loss: 0.2617
G_loss: 3.247

Iter: 1200
D loss: 0.2294
G_loss: 4.274

Iter: 1300
D loss: 0.1425
G_loss: 4.29

Iter: 1400
D loss: 0.1535
G_loss: 3.964

Iter: 1500
D loss: 0.1367
G_loss: 4.227

Iter: 1600
D loss: 0.2915
G_loss: 3.292

Iter: 1700
D loss: 0.2084
G_loss: 3.408

Iter: 1800
D loss: 0.1915
G_loss: 4.106

Iter: 1900
D loss: 0.1947
G_loss: 3.427

Iter: 2000
D loss: 0.2659
G_loss: 3.625

Iter: 2100
D loss: 0.1507
G_loss: 3.616

Iter: 2200
D loss: 0.2391
G_loss: 3.55

Iter: 2300
D loss: 0.2367
G_loss: 3.697

Iter: 2400
D loss: 0.3515
G_loss: 3.52

Iter: 2500
D loss: 0.4138
G_loss: 2.917

Iter: 2600
D loss: 0.178
G_loss: 3.976

Iter: 2700
D loss: 0.2699
G_loss: 3.871

Iter: 2800
D loss: 0.4472
G_loss: 2.813

Iter: 2900
D loss: 0.2604
G_loss: 3.976

Iter: 3000
D loss: 0.5274
G_loss: 4.239

Iter: 3100
D loss: 0.2547
G_loss: 4.066

Iter: 3200
D loss: 0.2563
G_loss: 3.925

Iter: 3300
D loss: 0.242
G_loss: 3.71

Iter: 3400
D loss: 0.1643
G_loss: 4.07

Iter: 3500
D loss: 0.3709
G_loss: 3.918

Iter: 3600
D loss: 0.6505
G_loss: 2.659

Iter: 3700
D loss: 0.4378
G_loss: 3.672

Iter: 3800
D loss: 0.6418
G_loss: 2.591

Iter: 3900
D loss: 0.3114
G_loss: 3.599

Iter: 4000
D loss: 0.3648
G_loss: 3.47

Iter: 4100
D loss: 0.5191
G_loss: 3.486

Iter: 4200
D loss: 0.5224
G_loss: 3.24

Iter: 4300
D loss: 0.3768
G_loss: 3.212

Iter: 4400
D loss: 0.8205
G_loss: 3.951

Iter: 4500
D loss: 0.4739
G_loss: 3.397

Iter: 4600
D loss: 0.3863
G_loss: 3.692

Iter: 4700
D loss: 0.4794
G_loss: 2.919

Iter: 4800
D loss: 0.6302
G_loss: 2.827

Iter: 4900
D loss: 0.526
G_loss: 3.286

Iter: 5000
D loss: 0.4366
G_loss: 3.336

Iter: 5100
D loss: 0.6547
G_loss: 3.142

Iter: 5200
D loss: 0.4721
G_loss: 3.029

Iter: 5300
D loss: 0.6084
G_loss: 3.498

Iter: 5400
D loss: 0.586
G_loss: 3.246

Iter: 5500
D loss: 0.6771
G_loss: 2.598

Iter: 5600
D loss: 0.6747
G_loss: 2.468

Iter: 5700
D loss: 0.4795
G_loss: 2.54

Iter: 5800
D loss: 0.6323
G_loss: 2.339

Iter: 5900
D loss: 0.8066
G_loss: 2.574

Iter: 6000
D loss: 0.8562
G_loss: 2.703

Iter: 6100
D loss: 0.579
G_loss: 3.28

Iter: 6200
D loss: 0.8105
G_loss: 2.332

Iter: 6300
D loss: 0.737
G_loss: 2.487

Iter: 6400
D loss: 0.626
G_loss: 3.057

Iter: 6500
D loss: 0.5672
G_loss: 2.545

Iter: 6600
D loss: 0.8093
G_loss: 2.157

Iter: 6700
D loss: 0.869
G_loss: 2.12

Iter: 6800
D loss: 0.8809
G_loss: 2.754

Iter: 6900
D loss: 0.8031
G_loss: 2.457

Iter: 7000
D loss: 0.9536
G_loss: 1.854

Iter: 7100
D loss: 0.9532
G_loss: 2.415

Iter: 7200
D loss: 0.6636
G_loss: 2.139

Iter: 7300
D loss: 1.059
G_loss: 2.002

Iter: 7400
D loss: 0.9469
G_loss: 1.998

Iter: 7500
D loss: 0.9041
G_loss: 2.169

Iter: 7600
D loss: 1.385
G_loss: 1.548

Iter: 7700
D loss: 0.9032
G_loss: 1.984

Iter: 7800
D loss: 0.8166
G_loss: 1.712

Iter: 7900
D loss: 0.6375
G_loss: 2.279

Iter: 8000
D loss: 0.8181
G_loss: 2.423

Iter: 8100
D loss: 0.9324
G_loss: 2.175

Iter: 8200
D loss: 0.7025
G_loss: 2.269

Iter: 8300
D loss: 0.9431
G_loss: 2.213

Iter: 8400
D loss: 0.8428
G_loss: 2.098

Iter: 8500
D loss: 1.101
G_loss: 2.148

Iter: 8600
D loss: 0.9127
G_loss: 2.294

Iter: 8700
D loss: 0.7616
G_loss: 2.644

Iter: 8800
D loss: 0.8791
G_loss: 2.28

Iter: 8900
D loss: 1.015
G_loss: 1.851

Iter: 9000
D loss: 0.8484
G_loss: 2.142

Iter: 9100
D loss: 1.018
G_loss: 1.783

Iter: 9200
D loss: 0.8733
G_loss: 2.059

Iter: 9300
D loss: 0.877
G_loss: 2.143

Iter: 9400
D loss: 0.8645
G_loss: 2.145

Iter: 9500
D loss: 0.6899
G_loss: 2.756

Iter: 9600
D loss: 0.7883
G_loss: 2.807

Iter: 9700
D loss: 0.7808
G_loss: 2.046

Iter: 9800
D loss: 1.082
G_loss: 1.768

Iter: 9900
D loss: 0.9232
G_loss: 2.121

HIIII
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800, 5900, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800, 6900, 7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900, 8000, 8100, 8200, 8300, 8400, 8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, 9900]
[1.684881, 9.206442, 8.754868, 6.32364, 3.608755, 1.9152205, 2.6473408, 4.4795046, 4.3178244, 3.2336397, 3.4707947, 3.2472594, 4.2744074, 4.2901716, 3.9641361, 4.226803, 3.2921498, 3.4081476, 4.106284, 3.4273083, 3.6252387, 3.6158168, 3.5504122, 3.696898, 3.5199876, 2.9172125, 3.976459, 3.8708282, 2.8133705, 3.9760838, 4.238699, 4.066002, 3.924694, 3.7104528, 4.069698, 3.9176111, 2.6586764, 3.6723275, 2.5913377, 3.5993729, 3.4703317, 3.4857392, 3.2397006, 3.2118886, 3.9509768, 3.3969688, 3.6923587, 2.918757, 2.8272676, 3.286473, 3.335577, 3.1419916, 3.0286913, 3.4984899, 3.2458038, 2.5978596, 2.4682178, 2.5402393, 2.3391514, 2.5743647, 2.7032127, 3.2798328, 2.3318403, 2.4874115, 3.0573516, 2.5452223, 2.156884, 2.119763, 2.754346, 2.4572, 1.8543271, 2.4153051, 2.1394491, 2.0021648, 1.9984959, 2.1687324, 1.5482419, 1.9841523, 1.711902, 2.2790656, 2.4228096, 2.1746378, 2.2685108, 2.2131615, 2.0977077, 2.1484997, 2.294256, 2.644086, 2.2797084, 1.8510472, 2.142312, 1.7831624, 2.059167, 2.1429417, 2.144662, 2.7564344, 2.8072648, 2.0457158, 1.7677, 2.1214027]

Observation

It is obeserved that the generator loss increases and the discrimnator loss decreases as number of epochs increases.

The first image 000.png

last Image 009.png

Hence, having the activation sigmoid was better as the image was clear, but rely can be considered a good possibility.

Loss D loss: 0.9232 G_loss: 2.121

Activation Function Leaky Relu

Next lets coonsider the activation function Leaky relu. The initial model code remains the same as we see below. Only the activation function will change. Obseve the comments closely to see the change made.

In [28]:
# Hyper parameter tuning Activation Function Leaky Relu

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as npd_train_loss
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
%matplotlib inline

epoch_list=[]
d_train_loss=[]
g_train_loss=[]

def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)


X = tf.placeholder(tf.float32, shape=[None, 784])

D_W1 = tf.Variable(xavier_init([784, 128]))
D_b1 = tf.Variable(tf.zeros(shape=[128]))

D_W2 = tf.Variable(xavier_init([128, 1]))
D_b2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [D_W1, D_W2, D_b1, D_b2]


Z = tf.placeholder(tf.float32, shape=[None, 100])

G_W1 = tf.Variable(xavier_init([100, 128]))
G_b1 = tf.Variable(tf.zeros(shape=[128]))

G_W2 = tf.Variable(xavier_init([128, 784]))
G_b2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [G_W1, G_W2, G_b1, G_b2]


def sample_Z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])

#activation function leaky relu for generator
def generator(z):
    G_h1 = tf.nn.leaky_relu(tf.matmul(z, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.leaky_relu(G_log_prob)

    return G_prob

#activation function for leaky relu for discriminator
def discriminator(x):
    D_h1 = tf.nn.leaky_relu(tf.matmul(x, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.leaky_relu(D_logit)

    return D_prob, D_logit


def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig


G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

# D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
# G_loss = -tf.reduce_mean(tf.log(D_fake))

# Alternative losses:
# -------------------
D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = D_loss_real + D_loss_fake
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))

D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)

mb_size = 128
Z_dim = 100

mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

if not os.path.exists('out_activation_leakyrelu/'):
    os.makedirs('out_activation_leakyrelu/')

i = 0

for it in range(10000):
    if it % 1000 == 0:
        samples = sess.run(G_sample, feed_dict={Z: sample_Z(16, Z_dim)})

        fig = plot(samples)
        plt.savefig('out_activation_leakyrelu/{}.png'.format(str(i).zfill(3)), bbox_inches='tight')
        i += 1
        plt.close(fig)

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

    if it % 100 == 0:
        print('Iter: {}'.format(it))
        print('D loss: {:.4}'. format(D_loss_curr))
        print('G_loss: {:.4}'.format(G_loss_curr))
        print()
        d_train_loss.append(D_loss_curr)
        g_train_loss.append(G_loss_curr)
        epoch_list.append(it)
plot_loss_epoch_generator()
plot_loss_epoch_discriminator()
Extracting ../../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data\t10k-labels-idx1-ubyte.gz
Iter: 0
D loss: 1.392
G_loss: 1.922

Iter: 100
D loss: 0.7105
G_loss: 6.975

Iter: 200
D loss: 0.06168
G_loss: 7.966

Iter: 300
D loss: 0.07169
G_loss: 3.899

Iter: 400
D loss: 0.1881
G_loss: 2.779

Iter: 500
D loss: 0.6211
G_loss: 1.782

Iter: 600
D loss: 0.7817
G_loss: 2.361

Iter: 700
D loss: 0.315
G_loss: 3.543

Iter: 800
D loss: 0.5474
G_loss: 2.345

Iter: 900
D loss: 0.6723
G_loss: 3.518

Iter: 1000
D loss: 0.4846
G_loss: 3.759

Iter: 1100
D loss: 0.8625
G_loss: 2.311

Iter: 1200
D loss: 0.5417
G_loss: 2.901

Iter: 1300
D loss: 0.4103
G_loss: 2.701

Iter: 1400
D loss: 0.5784
G_loss: 2.565

Iter: 1500
D loss: 0.6983
G_loss: 3.357

Iter: 1600
D loss: 0.4285
G_loss: 3.603

Iter: 1700
D loss: 0.9876
G_loss: 3.17

Iter: 1800
D loss: 0.5238
G_loss: 3.417

Iter: 1900
D loss: 0.6294
G_loss: 2.987

Iter: 2000
D loss: 0.8653
G_loss: 3.85

Iter: 2100
D loss: 0.7787
G_loss: 3.964

Iter: 2200
D loss: 0.6861
G_loss: 3.268

Iter: 2300
D loss: 0.6172
G_loss: 2.998

Iter: 2400
D loss: 0.6417
G_loss: 3.188

Iter: 2500
D loss: 0.9468
G_loss: 2.711

Iter: 2600
D loss: 0.4594
G_loss: 3.575

Iter: 2700
D loss: 0.659
G_loss: 2.738

Iter: 2800
D loss: 1.14
G_loss: 3.666

Iter: 2900
D loss: 1.046
G_loss: 3.345

Iter: 3000
D loss: 1.17
G_loss: 2.814

Iter: 3100
D loss: 0.8299
G_loss: 2.791

Iter: 3200
D loss: 0.7069
G_loss: 3.083

Iter: 3300
D loss: 1.135
G_loss: 3.095

Iter: 3400
D loss: 1.054
G_loss: 2.62

Iter: 3500
D loss: 0.8511
G_loss: 3.02

Iter: 3600
D loss: 1.33
G_loss: 2.482

Iter: 3700
D loss: 1.14
G_loss: 3.026

Iter: 3800
D loss: 0.7145
G_loss: 2.495

Iter: 3900
D loss: 1.219
G_loss: 2.003

Iter: 4000
D loss: 0.9043
G_loss: 2.796

Iter: 4100
D loss: 1.118
G_loss: 2.195

Iter: 4200
D loss: 1.175
G_loss: 2.574

Iter: 4300
D loss: 1.109
G_loss: 2.249

Iter: 4400
D loss: 0.6314
G_loss: 2.635

Iter: 4500
D loss: 1.306
G_loss: 2.311

Iter: 4600
D loss: 1.004
G_loss: 3.043

Iter: 4700
D loss: 0.9565
G_loss: 2.446

Iter: 4800
D loss: 1.325
G_loss: 2.227

Iter: 4900
D loss: 1.288
G_loss: 1.571

Iter: 5000
D loss: 1.685
G_loss: 1.918

Iter: 5100
D loss: 1.349
G_loss: 1.604

Iter: 5200
D loss: 1.448
G_loss: 1.658

Iter: 5300
D loss: 1.008
G_loss: 2.491

Iter: 5400
D loss: 1.47
G_loss: 2.037

Iter: 5500
D loss: 1.25
G_loss: 2.216

Iter: 5600
D loss: 1.692
G_loss: 1.459

Iter: 5700
D loss: 1.378
G_loss: 2.085

Iter: 5800
D loss: 1.891
G_loss: 1.406

Iter: 5900
D loss: 1.009
G_loss: 1.911

Iter: 6000
D loss: 1.202
G_loss: 1.795

Iter: 6100
D loss: 1.491
G_loss: 1.783

Iter: 6200
D loss: 1.495
G_loss: 1.852

Iter: 6300
D loss: 1.123
G_loss: 1.92

Iter: 6400
D loss: 1.317
G_loss: 1.752

Iter: 6500
D loss: 1.509
G_loss: 1.884

Iter: 6600
D loss: 1.838
G_loss: 1.289

Iter: 6700
D loss: 1.548
G_loss: 1.451

Iter: 6800
D loss: 1.576
G_loss: 1.719

Iter: 6900
D loss: 1.309
G_loss: 2.068

Iter: 7000
D loss: 1.858
G_loss: 1.349

Iter: 7100
D loss: 1.041
G_loss: 1.761

Iter: 7200
D loss: 1.557
G_loss: 1.012

Iter: 7300
D loss: 1.671
G_loss: 1.142

Iter: 7400
D loss: 1.343
G_loss: 1.515

Iter: 7500
D loss: 1.699
G_loss: 1.293

Iter: 7600
D loss: 1.66
G_loss: 1.244

Iter: 7700
D loss: 1.429
G_loss: 1.334

Iter: 7800
D loss: 1.455
G_loss: 0.9731

Iter: 7900
D loss: 1.449
G_loss: 1.305

Iter: 8000
D loss: 1.818
G_loss: 1.317

Iter: 8100
D loss: 1.259
G_loss: 1.47

Iter: 8200
D loss: 1.357
G_loss: 1.29

Iter: 8300
D loss: 1.482
G_loss: 1.085

Iter: 8400
D loss: 1.661
G_loss: 1.191

Iter: 8500
D loss: 1.296
G_loss: 1.256

Iter: 8600
D loss: 2.011
G_loss: 1.037

Iter: 8700
D loss: 1.783
G_loss: 1.601

Iter: 8800
D loss: 1.757
G_loss: 1.325

Iter: 8900
D loss: 1.861
G_loss: 0.8322

Iter: 9000
D loss: 1.374
G_loss: 1.492

Iter: 9100
D loss: 1.366
G_loss: 1.359

Iter: 9200
D loss: 1.377
G_loss: 1.204

Iter: 9300
D loss: 1.554
G_loss: 1.218

Iter: 9400
D loss: 1.506
G_loss: 1.148

Iter: 9500
D loss: 1.757
G_loss: 0.8195

Iter: 9600
D loss: 1.583
G_loss: 1.007

Iter: 9700
D loss: 1.609
G_loss: 1.463

Iter: 9800
D loss: 1.742
G_loss: 0.9896

Iter: 9900
D loss: 1.518
G_loss: 1.112

Observation:

It can be clearly observed that the discriminator loss gradually decreases as the generator loss increases. Lets have a look at the images generated :

1st Image on the first epoch:

000.png

Last Image on the last epoch 009.png

D loss: 1.518 G_loss: 1.112

Final Observation for Activation function

Leaky Relu and Relu have nearly the same outputs for 7000 epochs, but it is much clear with the sigmoid activation. Hence, the sigmoid activation function most preferred. Hnce, activation is an imporatnt parameter to tune for the GAN

Hyper parameter tuning Optimizer

Now lets, observe the combination of optimizer and the learning rate. We will first consider the optimizer Stochastic Gradient Descent and Lesrning rate of 0.001

The code remains the same as the initial model. The only changes will be with the optimizer and the learning rate. Observe the change in the code by following the comments

In [37]:
# Hyper parameter tuning Activation Function Leaky Relu

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as npd_train_loss
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
%matplotlib inline

epoch_list=[]
d_train_loss=[]
g_train_loss=[]

def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)


X = tf.placeholder(tf.float32, shape=[None, 784])

D_W1 = tf.Variable(xavier_init([784, 128]))
D_b1 = tf.Variable(tf.zeros(shape=[128]))

D_W2 = tf.Variable(xavier_init([128, 1]))
D_b2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [D_W1, D_W2, D_b1, D_b2]


Z = tf.placeholder(tf.float32, shape=[None, 100])

G_W1 = tf.Variable(xavier_init([100, 128]))
G_b1 = tf.Variable(tf.zeros(shape=[128]))

G_W2 = tf.Variable(xavier_init([128, 784]))
G_b2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [G_W1, G_W2, G_b1, G_b2]


def sample_Z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])


def generator(z):
    G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.sigmoid(G_log_prob)

    return G_prob


def discriminator(x):
    D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.sigmoid(D_logit)

    return D_prob, D_logit


def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig


G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

# D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
# G_loss = -tf.reduce_mean(tf.log(D_fake))

# Alternative losses:
# -------------------
D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = D_loss_real + D_loss_fake
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))

#change the optimizer with the learning rate here . Set optimizer to SGD and learning rate to 0.001
D_solver = tf.train.GradientDescentOptimizer(0.001).minimize(D_loss, var_list=theta_D)
G_solver = tf.train.GradientDescentOptimizer(0.001).minimize(G_loss, var_list=theta_G)

mb_size = 128
Z_dim = 100

mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

if not os.path.exists('out_optimizer_SGD/'):
    os.makedirs('out_optimizer_SGD/')

i = 0

for it in range(10000):
    if it % 1000 == 0:
        samples = sess.run(G_sample, feed_dict={Z: sample_Z(16, Z_dim)})

        fig = plot(samples)
        plt.savefig('out_optimizer_SGD/{}.png'.format(str(i).zfill(3)), bbox_inches='tight')
        i += 1
        plt.close(fig)

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

    if it % 100 == 0:
        print('Iter: {}'.format(it))
        print('D loss: {:.4}'. format(D_loss_curr))
        print('G_loss: {:.4}'.format(G_loss_curr))
        print()
        d_train_loss.append(D_loss_curr)
        g_train_loss.append(G_loss_curr)
        epoch_list.append(it)
plot_loss_epoch_generator()
plot_loss_epoch_discriminator()
Extracting ../../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data\t10k-labels-idx1-ubyte.gz
Iter: 0
D loss: 1.496
G_loss: 1.147

Iter: 100
D loss: 0.6683
G_loss: 2.101

Iter: 200
D loss: 0.408
G_loss: 2.533

Iter: 300
D loss: 0.2677
G_loss: 2.873

Iter: 400
D loss: 0.1951
G_loss: 3.112

Iter: 500
D loss: 0.1576
G_loss: 3.334

Iter: 600
D loss: 0.1412
G_loss: 3.567

Iter: 700
D loss: 0.1129
G_loss: 3.61

Iter: 800
D loss: 0.08827
G_loss: 3.712

Iter: 900
D loss: 0.09255
G_loss: 3.873

Iter: 1000
D loss: 0.08844
G_loss: 3.885

Iter: 1100
D loss: 0.07291
G_loss: 3.951

Iter: 1200
D loss: 0.07334
G_loss: 3.955

Iter: 1300
D loss: 0.06445
G_loss: 4.128

Iter: 1400
D loss: 0.05925
G_loss: 4.118

Iter: 1500
D loss: 0.06032
G_loss: 4.088

Iter: 1600
D loss: 0.05173
G_loss: 4.107

Iter: 1700
D loss: 0.06006
G_loss: 4.158

Iter: 1800
D loss: 0.05619
G_loss: 4.171

Iter: 1900
D loss: 0.05852
G_loss: 4.317

Iter: 2000
D loss: 0.05527
G_loss: 4.138

Iter: 2100
D loss: 0.05958
G_loss: 4.134

Iter: 2200
D loss: 0.06368
G_loss: 4.266

Iter: 2300
D loss: 0.0603
G_loss: 4.198

Iter: 2400
D loss: 0.0592
G_loss: 4.12

Iter: 2500
D loss: 0.06562
G_loss: 4.351

Iter: 2600
D loss: 0.05579
G_loss: 4.251

Iter: 2700
D loss: 0.07252
G_loss: 4.199

Iter: 2800
D loss: 0.06773
G_loss: 4.312

Iter: 2900
D loss: 0.06119
G_loss: 4.24

Iter: 3000
D loss: 0.06819
G_loss: 4.301

Iter: 3100
D loss: 0.07121
G_loss: 4.341

Iter: 3200
D loss: 0.06441
G_loss: 4.444

Iter: 3300
D loss: 0.07932
G_loss: 4.348

Iter: 3400
D loss: 0.06702
G_loss: 4.301

Iter: 3500
D loss: 0.07428
G_loss: 4.334

Iter: 3600
D loss: 0.08296
G_loss: 4.324

Iter: 3700
D loss: 0.08217
G_loss: 4.039

Iter: 3800
D loss: 0.09762
G_loss: 4.273

Iter: 3900
D loss: 0.08281
G_loss: 4.124

Iter: 4000
D loss: 0.08585
G_loss: 4.219

Iter: 4100
D loss: 0.08849
G_loss: 4.127

Iter: 4200
D loss: 0.09266
G_loss: 4.297

Iter: 4300
D loss: 0.09162
G_loss: 4.138

Iter: 4400
D loss: 0.08549
G_loss: 4.151

Iter: 4500
D loss: 0.1014
G_loss: 3.954

Iter: 4600
D loss: 0.1067
G_loss: 3.974

Iter: 4700
D loss: 0.1067
G_loss: 4.032

Iter: 4800
D loss: 0.1162
G_loss: 4.059

Iter: 4900
D loss: 0.107
G_loss: 4.033

Iter: 5000
D loss: 0.1098
G_loss: 3.827

Iter: 5100
D loss: 0.1317
G_loss: 3.964

Iter: 5200
D loss: 0.1219
G_loss: 3.93

Iter: 5300
D loss: 0.1253
G_loss: 3.879

Iter: 5400
D loss: 0.1181
G_loss: 3.85

Iter: 5500
D loss: 0.1104
G_loss: 4.07

Iter: 5600
D loss: 0.1297
G_loss: 3.723

Iter: 5700
D loss: 0.1318
G_loss: 3.804

Iter: 5800
D loss: 0.1365
G_loss: 3.72

Iter: 5900
D loss: 0.1459
G_loss: 3.78

Iter: 6000
D loss: 0.1352
G_loss: 3.426

Iter: 6100
D loss: 0.1276
G_loss: 3.457

Iter: 6200
D loss: 0.1382
G_loss: 3.394

Iter: 6300
D loss: 0.1309
G_loss: 3.572

Iter: 6400
D loss: 0.1624
G_loss: 3.503

Iter: 6500
D loss: 0.1682
G_loss: 3.561

Iter: 6600
D loss: 0.169
G_loss: 3.348

Iter: 6700
D loss: 0.1596
G_loss: 3.258

Iter: 6800
D loss: 0.1525
G_loss: 3.196

Iter: 6900
D loss: 0.1742
G_loss: 3.256

Iter: 7000
D loss: 0.1722
G_loss: 3.128

Iter: 7100
D loss: 0.1872
G_loss: 3.128

Iter: 7200
D loss: 0.1794
G_loss: 3.109

Iter: 7300
D loss: 0.1732
G_loss: 3.042

Iter: 7400
D loss: 0.1858
G_loss: 3.103

Iter: 7500
D loss: 0.1635
G_loss: 3.121

Iter: 7600
D loss: 0.1809
G_loss: 3.142

Iter: 7700
D loss: 0.1798
G_loss: 3.116

Iter: 7800
D loss: 0.1788
G_loss: 3.069

Iter: 7900
D loss: 0.1842
G_loss: 2.976

Iter: 8000
D loss: 0.186
G_loss: 3.004

Iter: 8100
D loss: 0.1749
G_loss: 2.996

Iter: 8200
D loss: 0.1858
G_loss: 2.947

Iter: 8300
D loss: 0.1796
G_loss: 2.958

Iter: 8400
D loss: 0.2116
G_loss: 2.892

Iter: 8500
D loss: 0.2118
G_loss: 2.843

Iter: 8600
D loss: 0.2173
G_loss: 2.832

Iter: 8700
D loss: 0.2567
G_loss: 2.794

Iter: 8800
D loss: 0.2194
G_loss: 2.675

Iter: 8900
D loss: 0.2631
G_loss: 2.549

Iter: 9000
D loss: 0.2763
G_loss: 2.455

Iter: 9100
D loss: 0.2811
G_loss: 2.579

Iter: 9200
D loss: 0.2721
G_loss: 2.526

Iter: 9300
D loss: 0.2532
G_loss: 2.471

Iter: 9400
D loss: 0.3068
G_loss: 2.446

Iter: 9500
D loss: 0.3193
G_loss: 2.557

Iter: 9600
D loss: 0.3109
G_loss: 2.498

Iter: 9700
D loss: 0.3197
G_loss: 2.43

Iter: 9800
D loss: 0.2932
G_loss: 2.528

Iter: 9900
D loss: 0.2892
G_loss: 2.531

Observation:

It is interesting to observe that there is a distinct rise and fall in the Discriminator and generator losses using the SGD Optimizer with a learning rate of 0.001.

Comparing the first and last image 000.png

009.png

The images are fairly blurred indicating that the SGD was not extremely effective with a learning rate of 0.01. Lets try and reduce the learning rate and observe if that had an effect. But it must be observed that the discriminator loss decreased considerably

But it is clear that tuning the optimizer is extremy important

D loss: 0.2892 G_loss: 2.531

Hyper parameter tuning Optimizer

Lets change the learning rate to 0.1 with the optimizer as SGD

The initial code for he model remains the same. Only diffrence is in the learning rate. Observe the comments to view the changes made

In [42]:
# Hyper parameter tuning Activation Function Leaky Relu

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as npd_train_loss
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
%matplotlib inline

epoch_list=[]
d_train_loss=[]
g_train_loss=[]

def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)


X = tf.placeholder(tf.float32, shape=[None, 784])

D_W1 = tf.Variable(xavier_init([784, 128]))
D_b1 = tf.Variable(tf.zeros(shape=[128]))

D_W2 = tf.Variable(xavier_init([128, 1]))
D_b2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [D_W1, D_W2, D_b1, D_b2]


Z = tf.placeholder(tf.float32, shape=[None, 100])

G_W1 = tf.Variable(xavier_init([100, 128]))
G_b1 = tf.Variable(tf.zeros(shape=[128]))

G_W2 = tf.Variable(xavier_init([128, 784]))
G_b2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [G_W1, G_W2, G_b1, G_b2]


def sample_Z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])


def generator(z):
    G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.sigmoid(G_log_prob)

    return G_prob


def discriminator(x):
    D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.sigmoid(D_logit)

    return D_prob, D_logit


def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig


G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

# D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
# G_loss = -tf.reduce_mean(tf.log(D_fake))

# Alternative losses:
# -------------------
D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = D_loss_real + D_loss_fake
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))
# change the optimizer to SGD and the learning rate to 0.1
D_solver = tf.train.GradientDescentOptimizer(0.1).minimize(D_loss, var_list=theta_D)
G_solver = tf.train.GradientDescentOptimizer(0.1).minimize(G_loss, var_list=theta_G)

mb_size = 128
Z_dim = 100

mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

# this folder will be automatically created and the images will be stored here
if not os.path.exists('out_optimizer_SGD_LR/'):
    os.makedirs('out_optimizer_SGD_LR/')

i = 0

for it in range(10000):
    if it % 1000 == 0:
        samples = sess.run(G_sample, feed_dict={Z: sample_Z(16, Z_dim)})

        fig = plot(samples)
        plt.savefig('out_optimizer_SGD_LR/{}.png'.format(str(i).zfill(3)), bbox_inches='tight')
        i += 1
        plt.close(fig)

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

    if it % 100 == 0:
        print('Iter: {}'.format(it))
        print('D loss: {:.4}'. format(D_loss_curr))
        print('G_loss: {:.4}'.format(G_loss_curr))
        print()
        d_train_loss.append(D_loss_curr)
        g_train_loss.append(G_loss_curr)
        epoch_list.append(it)
plot_loss_epoch_generator()
plot_loss_epoch_discriminator()
Extracting ../../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data\t10k-labels-idx1-ubyte.gz
Iter: 0
D loss: 1.471
G_loss: 8.117

Iter: 100
D loss: 0.397
G_loss: 3.628

Iter: 200
D loss: 0.1916
G_loss: 2.416

Iter: 300
D loss: 0.1806
G_loss: 2.654

Iter: 400
D loss: 0.1231
G_loss: 5.375

Iter: 500
D loss: 0.1625
G_loss: 3.624

Iter: 600
D loss: 0.2894
G_loss: 3.107

Iter: 700
D loss: 0.2357
G_loss: 4.295

Iter: 800
D loss: 0.1666
G_loss: 3.844

Iter: 900
D loss: 0.1626
G_loss: 3.264

Iter: 1000
D loss: 0.2386
G_loss: 2.485

Iter: 1100
D loss: 0.1268
G_loss: 3.817

Iter: 1200
D loss: 0.08857
G_loss: 3.906

Iter: 1300
D loss: 0.09735
G_loss: 3.61

Iter: 1400
D loss: 0.1625
G_loss: 3.304

Iter: 1500
D loss: 0.1225
G_loss: 3.47

Iter: 1600
D loss: 0.2005
G_loss: 3.598

Iter: 1700
D loss: 0.2317
G_loss: 4.803

Iter: 1800
D loss: 0.1465
G_loss: 3.801

Iter: 1900
D loss: 0.09215
G_loss: 3.73

Iter: 2000
D loss: 0.1685
G_loss: 4.954

Iter: 2100
D loss: 0.04965
G_loss: 3.91

Iter: 2200
D loss: 0.1682
G_loss: 4.311

Iter: 2300
D loss: 0.2204
G_loss: 3.69

Iter: 2400
D loss: 0.268
G_loss: 4.308

Iter: 2500
D loss: 0.1846
G_loss: 3.403

Iter: 2600
D loss: 0.3414
G_loss: 2.009

Iter: 2700
D loss: 0.2491
G_loss: 3.258

Iter: 2800
D loss: 0.2117
G_loss: 3.655

Iter: 2900
D loss: 0.181
G_loss: 4.929

Iter: 3000
D loss: 0.1193
G_loss: 3.819

Iter: 3100
D loss: 0.225
G_loss: 3.397

Iter: 3200
D loss: 0.207
G_loss: 4.884

Iter: 3300
D loss: 0.1231
G_loss: 3.332

Iter: 3400
D loss: 0.2126
G_loss: 3.221

Iter: 3500
D loss: 0.1306
G_loss: 3.439

Iter: 3600
D loss: 0.1818
G_loss: 3.014

Iter: 3700
D loss: 0.2261
G_loss: 3.533

Iter: 3800
D loss: 0.1733
G_loss: 3.534

Iter: 3900
D loss: 0.2902
G_loss: 4.687

Iter: 4000
D loss: 0.281
G_loss: 3.176

Iter: 4100
D loss: 0.2146
G_loss: 3.312

Iter: 4200
D loss: 0.2558
G_loss: 2.894

Iter: 4300
D loss: 0.2632
G_loss: 2.911

Iter: 4400
D loss: 0.2533
G_loss: 3.097

Iter: 4500
D loss: 0.2599
G_loss: 2.71

Iter: 4600
D loss: 0.2937
G_loss: 3.063

Iter: 4700
D loss: 0.1641
G_loss: 2.766

Iter: 4800
D loss: 0.3019
G_loss: 2.617

Iter: 4900
D loss: 0.2423
G_loss: 3.041

Iter: 5000
D loss: 0.1451
G_loss: 3.425

Iter: 5100
D loss: 0.1851
G_loss: 3.129

Iter: 5200
D loss: 0.2885
G_loss: 2.679

Iter: 5300
D loss: 0.2269
G_loss: 3.325

Iter: 5400
D loss: 0.1321
G_loss: 3.144

Iter: 5500
D loss: 0.1967
G_loss: 3.225

Iter: 5600
D loss: 0.2531
G_loss: 2.948

Iter: 5700
D loss: 0.2644
G_loss: 2.736

Iter: 5800
D loss: 0.2632
G_loss: 2.777

Iter: 5900
D loss: 0.2317
G_loss: 3.192

Iter: 6000
D loss: 0.2102
G_loss: 3.224

Iter: 6100
D loss: 0.2251
G_loss: 2.948

Iter: 6200
D loss: 0.104
G_loss: 3.566

Iter: 6300
D loss: 0.1651
G_loss: 3.254

Iter: 6400
D loss: 0.2251
G_loss: 4.57

Iter: 6500
D loss: 0.2455
G_loss: 2.913

Iter: 6600
D loss: 0.0955
G_loss: 3.681

Iter: 6700
D loss: 0.2112
G_loss: 3.179

Iter: 6800
D loss: 0.2017
G_loss: 3.183

Iter: 6900
D loss: 0.29
G_loss: 2.764

Iter: 7000
D loss: 0.1994
G_loss: 3.29

Iter: 7100
D loss: 0.3148
G_loss: 2.593

Iter: 7200
D loss: 0.2622
G_loss: 2.93

Iter: 7300
D loss: 0.256
G_loss: 2.804

Iter: 7400
D loss: 0.3265
G_loss: 4.203

Iter: 7500
D loss: 0.2524
G_loss: 2.784

Iter: 7600
D loss: 0.3378
G_loss: 2.8

Iter: 7700
D loss: 0.2386
G_loss: 3.023

Iter: 7800
D loss: 0.2723
G_loss: 2.82

Iter: 7900
D loss: 0.1863
G_loss: 3.138

Iter: 8000
D loss: 0.4242
G_loss: 2.337

Iter: 8100
D loss: 0.1898
G_loss: 3.043

Iter: 8200
D loss: 0.3018
G_loss: 2.629

Iter: 8300
D loss: 0.3531
G_loss: 2.69

Iter: 8400
D loss: 0.2642
G_loss: 3.219

Iter: 8500
D loss: 0.2773
G_loss: 3.365

Iter: 8600
D loss: 0.2754
G_loss: 3.003

Iter: 8700
D loss: 0.3248
G_loss: 2.531

Iter: 8800
D loss: 0.271
G_loss: 3.027

Iter: 8900
D loss: 0.3599
G_loss: 2.509

Iter: 9000
D loss: 0.1856
G_loss: 2.985

Iter: 9100
D loss: 0.3641
G_loss: 2.444

Iter: 9200
D loss: 0.3299
G_loss: 2.578

Iter: 9300
D loss: 0.3745
G_loss: 2.469

Iter: 9400
D loss: 0.1192
G_loss: 3.273

Iter: 9500
D loss: 0.2831
G_loss: 2.573

Iter: 9600
D loss: 0.3775
G_loss: 2.314

Iter: 9700
D loss: 0.3662
G_loss: 2.397

Iter: 9800
D loss: 0.3573
G_loss: 2.526

Iter: 9900
D loss: 0.3045
G_loss: 2.856

Observation

Increasing the learnignrate did directly affect the loss . As there is a change in the graphs of the generator and discriminator loss . There seems to rise and drop in loss with the increase in the learning rate. Lets compare the images

First Image 000.png

Last Image 009.png

Increasing the learning rate did have an effect as now it leanrt the areas of pixels. Addtionally, in comparison to previous images with SGD (learning rate of 0.001) the pixels were yet sprawled and not localized. At this point , it ould be interesting to see of training the very same model for larger number of epochs would have an impact.

D loss: 0.3045 G_loss: 2.856

Lets increase the number of epochs to see if there. But it must be noted that the discimator loss is low in comparison to bench mark model which is a good sign.

Next, lets keep all hyper parameters as they are and just increase number of epochs to 20000 to see increasing the learning rate had an impact

In [43]:
## Increasing number of Epochs with SGD and LR -0.1

# Hyper parameter tuning Activation Function Leaky Relu

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as npd_train_loss
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
%matplotlib inline

epoch_list=[]
d_train_loss=[]
g_train_loss=[]

def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)


X = tf.placeholder(tf.float32, shape=[None, 784])

D_W1 = tf.Variable(xavier_init([784, 128]))
D_b1 = tf.Variable(tf.zeros(shape=[128]))

D_W2 = tf.Variable(xavier_init([128, 1]))
D_b2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [D_W1, D_W2, D_b1, D_b2]


Z = tf.placeholder(tf.float32, shape=[None, 100])

G_W1 = tf.Variable(xavier_init([100, 128]))
G_b1 = tf.Variable(tf.zeros(shape=[128]))

G_W2 = tf.Variable(xavier_init([128, 784]))
G_b2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [G_W1, G_W2, G_b1, G_b2]


def sample_Z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])


def generator(z):
    G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.sigmoid(G_log_prob)

    return G_prob


def discriminator(x):
    D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.sigmoid(D_logit)

    return D_prob, D_logit


def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig


G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

# D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
# G_loss = -tf.reduce_mean(tf.log(D_fake))

# Alternative losses:
# -------------------
D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = D_loss_real + D_loss_fake
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))

D_solver = tf.train.GradientDescentOptimizer(0.1).minimize(D_loss, var_list=theta_D)
G_solver = tf.train.GradientDescentOptimizer(0.1).minimize(G_loss, var_list=theta_G)

mb_size = 128
Z_dim = 100

mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

if not os.path.exists('out_optimizer_SGD_LR_Epochs/'):
    os.makedirs('out_optimizer_SGD_LR_Epochs/')

i = 0
# increase the learning rate to 20000
for it in range(20000):
    if it % 1000 == 0:
        samples = sess.run(G_sample, feed_dict={Z: sample_Z(16, Z_dim)})

        fig = plot(samples)
        plt.savefig('out_optimizer_SGD_LR_Epochs/{}.png'.format(str(i).zfill(3)), bbox_inches='tight')
        i += 1
        plt.close(fig)

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

    if it % 100 == 0:
        print('Iter: {}'.format(it))
        print('D loss: {:.4}'. format(D_loss_curr))
        print('G_loss: {:.4}'.format(G_loss_curr))
        print()
        d_train_loss.append(D_loss_curr)
        g_train_loss.append(G_loss_curr)
        epoch_list.append(it)
plot_loss_epoch_generator()
plot_loss_epoch_discriminator()
Extracting ../../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data\t10k-labels-idx1-ubyte.gz
Iter: 0
D loss: 1.381
G_loss: 4.956

Iter: 100
D loss: 0.4434
G_loss: 3.596

Iter: 200
D loss: 0.394
G_loss: 2.69

Iter: 300
D loss: 0.1935
G_loss: 2.879

Iter: 400
D loss: 0.2681
G_loss: 2.802

Iter: 500
D loss: 0.1124
G_loss: 3.707

Iter: 600
D loss: 0.07602
G_loss: 4.312

Iter: 700
D loss: 0.1268
G_loss: 3.609

Iter: 800
D loss: 0.2654
G_loss: 2.712

Iter: 900
D loss: 0.366
G_loss: 4.259

Iter: 1000
D loss: 0.1663
G_loss: 3.311

Iter: 1100
D loss: 0.2366
G_loss: 3.219

Iter: 1200
D loss: 0.2861
G_loss: 2.882

Iter: 1300
D loss: 0.2117
G_loss: 3.032

Iter: 1400
D loss: 0.323
G_loss: 2.648

Iter: 1500
D loss: 0.3477
G_loss: 2.601

Iter: 1600
D loss: 0.4811
G_loss: 2.77

Iter: 1700
D loss: 0.3659
G_loss: 2.523

Iter: 1800
D loss: 0.3536
G_loss: 2.058

Iter: 1900
D loss: 0.2342
G_loss: 3.111

Iter: 2000
D loss: 0.2468
G_loss: 2.673

Iter: 2100
D loss: 0.336
G_loss: 3.672

Iter: 2200
D loss: 0.2989
G_loss: 2.887

Iter: 2300
D loss: 0.3945
G_loss: 2.258

Iter: 2400
D loss: 0.3459
G_loss: 2.508

Iter: 2500
D loss: 0.2937
G_loss: 2.42

Iter: 2600
D loss: 0.2746
G_loss: 2.663

Iter: 2700
D loss: 0.3244
G_loss: 2.613

Iter: 2800
D loss: 0.2453
G_loss: 3.094

Iter: 2900
D loss: 0.3741
G_loss: 2.872

Iter: 3000
D loss: 0.2612
G_loss: 2.756

Iter: 3100
D loss: 0.2705
G_loss: 2.808

Iter: 3200
D loss: 0.2981
G_loss: 3.202

Iter: 3300
D loss: 0.315
G_loss: 2.726

Iter: 3400
D loss: 0.2998
G_loss: 2.738

Iter: 3500
D loss: 0.2481
G_loss: 3.118

Iter: 3600
D loss: 0.3359
G_loss: 2.608

Iter: 3700
D loss: 0.2568
G_loss: 3.022

Iter: 3800
D loss: 0.2272
G_loss: 3.265

Iter: 3900
D loss: 0.2609
G_loss: 3.288

Iter: 4000
D loss: 0.2374
G_loss: 3.173

Iter: 4100
D loss: 0.2661
G_loss: 2.946

Iter: 4200
D loss: 0.2844
G_loss: 2.996

Iter: 4300
D loss: 0.3313
G_loss: 2.72

Iter: 4400
D loss: 0.2533
G_loss: 3.101

Iter: 4500
D loss: 0.2463
G_loss: 2.784

Iter: 4600
D loss: 0.244
G_loss: 3.021

Iter: 4700
D loss: 0.3682
G_loss: 2.704

Iter: 4800
D loss: 0.2284
G_loss: 2.965

Iter: 4900
D loss: 0.4024
G_loss: 2.775

Iter: 5000
D loss: 0.3528
G_loss: 2.888

Iter: 5100
D loss: 0.2464
G_loss: 3.015

Iter: 5200
D loss: 0.2648
G_loss: 3.033

Iter: 5300
D loss: 0.2259
G_loss: 2.903

Iter: 5400
D loss: 0.2777
G_loss: 2.871

Iter: 5500
D loss: 0.2985
G_loss: 2.512

Iter: 5600
D loss: 0.2429
G_loss: 3.125

Iter: 5700
D loss: 0.2902
G_loss: 2.772

Iter: 5800
D loss: 0.2305
G_loss: 3.022

Iter: 5900
D loss: 0.231
G_loss: 2.955

Iter: 6000
D loss: 0.326
G_loss: 2.713

Iter: 6100
D loss: 0.1995
G_loss: 3.452

Iter: 6200
D loss: 0.3361
G_loss: 2.453

Iter: 6300
D loss: 0.2589
G_loss: 2.854

Iter: 6400
D loss: 0.3266
G_loss: 2.745

Iter: 6500
D loss: 0.2425
G_loss: 2.909

Iter: 6600
D loss: 0.3497
G_loss: 2.75

Iter: 6700
D loss: 0.2126
G_loss: 2.769

Iter: 6800
D loss: 0.2553
G_loss: 2.966

Iter: 6900
D loss: 0.1999
G_loss: 3.197

Iter: 7000
D loss: 0.2771
G_loss: 2.9

Iter: 7100
D loss: 0.2695
G_loss: 2.981

Iter: 7200
D loss: 0.2345
G_loss: 2.967

Iter: 7300
D loss: 0.3157
G_loss: 2.658

Iter: 7400
D loss: 0.3127
G_loss: 2.695

Iter: 7500
D loss: 0.3821
G_loss: 2.222

Iter: 7600
D loss: 0.3207
G_loss: 2.687

Iter: 7700
D loss: 0.1292
G_loss: 3.287

Iter: 7800
D loss: 0.2563
G_loss: 2.62

Iter: 7900
D loss: 0.1613
G_loss: 3.201

Iter: 8000
D loss: 0.2135
G_loss: 2.938

Iter: 8100
D loss: 0.3206
G_loss: 2.766

Iter: 8200
D loss: 0.4279
G_loss: 2.176

Iter: 8300
D loss: 0.2998
G_loss: 2.78

Iter: 8400
D loss: 0.2288
G_loss: 2.966

Iter: 8500
D loss: 0.3476
G_loss: 2.728

Iter: 8600
D loss: 0.2201
G_loss: 2.942

Iter: 8700
D loss: 0.3485
G_loss: 2.759

Iter: 8800
D loss: 0.2334
G_loss: 3.038

Iter: 8900
D loss: 0.2197
G_loss: 3.026

Iter: 9000
D loss: 0.266
G_loss: 2.883

Iter: 9100
D loss: 0.3234
G_loss: 2.545

Iter: 9200
D loss: 0.2565
G_loss: 3.031

Iter: 9300
D loss: 0.3319
G_loss: 2.637

Iter: 9400
D loss: 0.2446
G_loss: 2.927

Iter: 9500
D loss: 0.261
G_loss: 2.994

Iter: 9600
D loss: 0.3276
G_loss: 2.701

Iter: 9700
D loss: 0.3644
G_loss: 2.382

Iter: 9800
D loss: 0.3267
G_loss: 2.698

Iter: 9900
D loss: 0.2059
G_loss: 3.013

Iter: 10000
D loss: 0.3412
G_loss: 2.647

Iter: 10100
D loss: 0.2821
G_loss: 2.777

Iter: 10200
D loss: 0.311
G_loss: 2.688

Iter: 10300
D loss: 0.2517
G_loss: 2.899

Iter: 10400
D loss: 0.279
G_loss: 2.696

Iter: 10500
D loss: 0.3669
G_loss: 2.403

Iter: 10600
D loss: 0.3811
G_loss: 2.579

Iter: 10700
D loss: 0.4113
G_loss: 2.269

Iter: 10800
D loss: 0.2107
G_loss: 2.856

Iter: 10900
D loss: 0.3593
G_loss: 2.551

Iter: 11000
D loss: 0.1764
G_loss: 3.149

Iter: 11100
D loss: 0.2856
G_loss: 2.712

Iter: 11200
D loss: 0.1906
G_loss: 3.148

Iter: 11300
D loss: 0.2913
G_loss: 2.823

Iter: 11400
D loss: 0.2885
G_loss: 2.809

Iter: 11500
D loss: 0.1631
G_loss: 3.13

Iter: 11600
D loss: 0.2711
G_loss: 2.925

Iter: 11700
D loss: 0.356
G_loss: 2.506

Iter: 11800
D loss: 0.232
G_loss: 3.041

Iter: 11900
D loss: 0.2618
G_loss: 2.852

Iter: 12000
D loss: 0.2819
G_loss: 2.83

Iter: 12100
D loss: 0.4243
G_loss: 2.232

Iter: 12200
D loss: 0.3694
G_loss: 2.46

Iter: 12300
D loss: 0.2282
G_loss: 2.884

Iter: 12400
D loss: 0.249
G_loss: 3.024

Iter: 12500
D loss: 0.2825
G_loss: 2.86

Iter: 12600
D loss: 0.3447
G_loss: 2.734

Iter: 12700
D loss: 0.3362
G_loss: 2.518

Iter: 12800
D loss: 0.2757
G_loss: 2.851

Iter: 12900
D loss: 0.3028
G_loss: 2.634

Iter: 13000
D loss: 0.2268
G_loss: 3.004

Iter: 13100
D loss: 0.1509
G_loss: 3.143

Iter: 13200
D loss: 0.3461
G_loss: 2.699

Iter: 13300
D loss: 0.251
G_loss: 3.055

Iter: 13400
D loss: 0.3296
G_loss: 2.514

Iter: 13500
D loss: 0.3319
G_loss: 2.483

Iter: 13600
D loss: 0.3092
G_loss: 2.673

Iter: 13700
D loss: 0.3291
G_loss: 2.424

Iter: 13800
D loss: 0.2319
G_loss: 2.943

Iter: 13900
D loss: 0.3037
G_loss: 2.661

Iter: 14000
D loss: 0.2352
G_loss: 2.78

Iter: 14100
D loss: 0.329
G_loss: 2.896

Iter: 14200
D loss: 0.2522
G_loss: 2.675

Iter: 14300
D loss: 0.359
G_loss: 2.311

Iter: 14400
D loss: 0.2682
G_loss: 3.014

Iter: 14500
D loss: 0.3808
G_loss: 2.43

Iter: 14600
D loss: 0.2823
G_loss: 2.748

Iter: 14700
D loss: 0.3614
G_loss: 2.7

Iter: 14800
D loss: 0.278
G_loss: 2.422

Iter: 14900
D loss: 0.3521
G_loss: 2.552

Iter: 15000
D loss: 0.3404
G_loss: 2.76

Iter: 15100
D loss: 0.3535
G_loss: 2.601

Iter: 15200
D loss: 0.3661
G_loss: 2.457

Iter: 15300
D loss: 0.2831
G_loss: 2.793

Iter: 15400
D loss: 0.3327
G_loss: 2.839

Iter: 15500
D loss: 0.3364
G_loss: 2.795

Iter: 15600
D loss: 0.2382
G_loss: 2.881

Iter: 15700
D loss: 0.2816
G_loss: 2.567

Iter: 15800
D loss: 0.214
G_loss: 3.259

Iter: 15900
D loss: 0.2557
G_loss: 2.918

Iter: 16000
D loss: 0.3167
G_loss: 2.762

Iter: 16100
D loss: 0.2965
G_loss: 2.768

Iter: 16200
D loss: 0.3019
G_loss: 2.529

Iter: 16300
D loss: 0.291
G_loss: 2.899

Iter: 16400
D loss: 0.3311
G_loss: 2.469

Iter: 16500
D loss: 0.3038
G_loss: 2.701

Iter: 16600
D loss: 0.445
G_loss: 2.051

Iter: 16700
D loss: 0.3129
G_loss: 2.872

Iter: 16800
D loss: 0.2704
G_loss: 2.589

Iter: 16900
D loss: 0.4052
G_loss: 2.443

Iter: 17000
D loss: 0.2534
G_loss: 2.716

Iter: 17100
D loss: 0.3902
G_loss: 2.424

Iter: 17200
D loss: 0.2823
G_loss: 2.644

Iter: 17300
D loss: 0.2594
G_loss: 2.905

Iter: 17400
D loss: 0.3267
G_loss: 2.69

Iter: 17500
D loss: 0.3551
G_loss: 2.62

Iter: 17600
D loss: 0.2589
G_loss: 3.223

Iter: 17700
D loss: 0.2508
G_loss: 2.507

Iter: 17800
D loss: 0.249
G_loss: 2.94

Iter: 17900
D loss: 0.404
G_loss: 2.51

Iter: 18000
D loss: 0.3514
G_loss: 2.592

Iter: 18100
D loss: 0.2851
G_loss: 2.549

Iter: 18200
D loss: 0.3365
G_loss: 2.659

Iter: 18300
D loss: 0.3335
G_loss: 2.618

Iter: 18400
D loss: 0.2929
G_loss: 2.864

Iter: 18500
D loss: 0.3208
G_loss: 2.795

Iter: 18600
D loss: 0.374
G_loss: 2.706

Iter: 18700
D loss: 0.2882
G_loss: 2.707

Iter: 18800
D loss: 0.337
G_loss: 2.751

Iter: 18900
D loss: 0.2232
G_loss: 2.94

Iter: 19000
D loss: 0.3242
G_loss: 2.776

Iter: 19100
D loss: 0.2643
G_loss: 2.655

Iter: 19200
D loss: 0.2257
G_loss: 2.716

Iter: 19300
D loss: 0.2675
G_loss: 2.824

Iter: 19400
D loss: 0.1796
G_loss: 3.176

Iter: 19500
D loss: 0.2965
G_loss: 2.641

Iter: 19600
D loss: 0.3546
G_loss: 2.682

Iter: 19700
D loss: 0.2901
G_loss: 2.767

Iter: 19800
D loss: 0.3082
G_loss: 2.566

Iter: 19900
D loss: 0.2999
G_loss: 2.861

Observation

Increasing the number epochs with increase in elarning rate and optimizer as SGD did improve the quality of image generate. It is prominent by the decrease in discriminator loss

D loss: 0.2999 G_loss: 2.861

This loss is very less in comparison to the loss of the bench mark model. 000.png

019.png

Final Result of tuning the optimizer and learning rate

It is observed that when the learning rate was increased the with the Stochastic gradient descent the loss Decreased further. It is worth mentioning that even with alearning rate of 0.01 the loss was much less than that of the bench mark model Hence , the optimizer and learning rate are imporatant combination to tune for a GAN.

Final Result of Hyper Parameter Tuning the GAN

  1. It is observed that tuning the activation function plays an imporatant role in tuning the Gan. Though the combination of Relu and Sigmoid for the provided the least loss for the discriminator

  2. Training the model with SGD and learning rate of 0.01 and then 0.1 respectively reduced the loss significantly . It was much better thanthe loss of the bench mark model. Additonally increasing the number of epochs with increased learning rate helped further.

To conclude the activation function , optimizer and learning rate are imporatnt and prospective parameters to tune for the GAN

Summary

image.png

Autoencoder

Overview of Autoencoders

An autoencoder is an artificial neural network used for unsupervised learning of efficient codings. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. Recently, the autoencoder concept has become more widely used for learning generative models of data.

Dataset Description

The MNIST database of handwritten digit, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. Each image as dimensions of28X28 images.The dataset consists of pair, “handwritten digit image” and “label”. Digit ranges from 0 to 9, meaning 10 patterns in total. It cnsists of he folowing : handwritten digit image: This is gray scale image with size 28 x 28 pixel. label : This is actual digit number this handwritten digit image represents. It is either 0 to 9.

This is popular dataset set used in Datascience and a popular Hello World Dataset

Dataset Download and Use

We will be using the MNIST Datset from the Tensorflow Library. This will be automatically used from the tensorflow code. No pre- requisites are present.

!image.png

Structure of the Autoencoder

This is a denoising Autoencoder. The autoencoder uses an activation of leaky relu. The encoder and decoder consist of a convolutional layer and uses the Sigmoid Activation with the Sigmoid Cross entropy cost function having using the Adam Optimizer. It is initialized with a learning rate of 0.00001 noise factor of 0.5.

Code in tensorflow for Autoencoder Model

The structure of code for the autoencoder for tensorflow is similar. It consists of place holders, hyper parameter initilaization and running the tensorflow session . Lets walk through the code

The encoder has two convolutional layers and two max pooling layers. Both Convolution layer-1 and Convolution layer-2 have 32-3 x 3 filters. There are two max-pooling layers each of size 2 x 2.

Import necessary libraries

In [1]:
import numpy as np
import sys
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
C:\Users\jaini\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

read the data from the MNISTdataset Tensorflow

In [47]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Define the input and output placeholders

In [48]:
inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

Define activation functions as leaky relu

In [49]:
#alpha is negative slope coefficient
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

Defining the encoder. It consists of 2 convolutional layers and max pooling layers. Each convolutional layer down samples the images

In [51]:
### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

Each decoder upsamples the image

In [52]:
### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

Define the loss , learning rate and optimizer

In [53]:
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=targets_)

learning_rate=tf.placeholder(tf.float32)
cost = tf.reduce_mean(loss)  #cost
opt = tf.train.AdamOptimizer(learning_rate).minimize(cost) #optimizer

Lets train the model and initialize the tensorflow session. Follow the comments closely to understand the code.

In [56]:
# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate
lr=1e-5
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)

#training in batches
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
        #introduce noise for the test set
        noise_factor = 0.5
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
        #introduce noise for train set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
      #calculate the cost
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
  # Plot the loss and epochs 
    loss.append(batch_cost)
    valid_loss.append(batch_cost_test)
    plt.plot(range(e+1), loss, 'bo', label='Training loss')
    plt.plot(range(e+1), valid_loss, 'r', label='Validation loss')
    plt.title('Training and validation loss')
    plt.xlabel('Epochs ',fontsize=16)
    plt.ylabel('Loss',fontsize=16)
    plt.legend()
    plt.figure()
    plt.show()
#     saver.save(sess, 'encode_model') 
# plot real image, noise image and generated image
batch_x= mnist.test.next_batch(10)
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.5
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')
print("Original Images")
for i in range(10):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
for i in range(10):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(10):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.5105 Validation loss: 0.5102
<matplotlib.figure.Figure at 0x17a2eec0dd8>
Epoch: 2/5... Training loss: 0.3627 Validation loss: 0.3604
<matplotlib.figure.Figure at 0x17a2eec9898>
Epoch: 3/5... Training loss: 0.1999 Validation loss: 0.2030
<matplotlib.figure.Figure at 0x17a2eeb5358>
Epoch: 4/5... Training loss: 0.1784 Validation loss: 0.1787
<matplotlib.figure.Figure at 0x17a24cc5b00>
Epoch: 5/5... Training loss: 0.1575 Validation loss: 0.1575
<matplotlib.figure.Figure at 0x17a24cb2128>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation :

We observe that the training loss and testing loss go hand in hand for each epoch. This is good performance as the training and testing is consistent. This was run for 5 epochs as the generation of images took extremely long on the current computational power. However, it is observed that for 5 epochs the autoencoder has done a good job as the images have been generated and can be distinguished one from the other to the extent that we have faintly recognize each digit. The parameters used include the Adam Optimizer, 2 layered convolution neural network for the encoder and the decoder. The activation used is Relu.

Training loss: 0.1575 Validation loss: 0.1575

Next, we will tune various hyper parameters controlling the Auto Encoder

Hyper Parameter Tuning for Auto Encoder

Lets start by tuning the optimizer with RMSPROp. Initial code of the model remains the same. Lokk for changes in the comments

In [8]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [9]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

# Place holders for the network

inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

# Activation Function Relu
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

#Defining the learning rate and cost

loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=targets_)
# change optimizer to RMSPRop
learning_rate=tf.placeholder(tf.float32)
cost = tf.reduce_mean(loss)  #cost
opt = tf.train.RMSPropOptimizer(learning_rate).minimize(cost) #optimizer


# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []
epoch_list=[]



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate value
lr=1e-5

# Start the Tensorflow Session
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in the test set
        noise_factor = 0.5
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in training set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Loss for the training set
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
#loss for the testing set   
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
   
        loss.append(batch_cost)
        valid_loss.append(batch_cost_test)
        epoch_list.append(e)

#plotting the validation and training loss
plt.plot(epoch_list, loss, 'bo', label='Training loss')
plt.plot(epoch_list, valid_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
#     saver.save(sess, 'encode_model') 

#understanding the output for the testing set
# printing origibal, noise induced and generated images
batch_x= mnist.test.next_batch(3)
#inducing noise for the testing set
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.5
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')

print("Original Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
#noisy images
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.5087 Validation loss: 0.5084
Epoch: 2/5... Training loss: 0.4319 Validation loss: 0.4314
Epoch: 3/5... Training loss: 0.2668 Validation loss: 0.2658
Epoch: 4/5... Training loss: 0.1987 Validation loss: 0.1952
Epoch: 5/5... Training loss: 0.1678 Validation loss: 0.1681
<matplotlib.figure.Figure at 0x22920a55550>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation

There is clearly a difference in the images generated by the adam optimizer and the RMSProp Optimizer. This clearly shows that the optimizer does play an important role for the Autoencoder. Use of the RMSProp Optimizer did not help as the images generated by the decoder are not extremely clear. Additionally the validation and training losses also reduce consistently. Lets explore the Adadelta optimizer next .

Training loss: 0.1678 Validation loss: 0.1681

Hyper parameter Optimization Autoencoder

let use the Optimizer Adadelta. Reuse the code from the first model. Observe comments to track changes

In [10]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [11]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

# Place holders for the network

inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

# Activation Function Relu
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

#Defining the learning rate and cost

loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=targets_)
# change optimizer here to Adadelta
learning_rate=tf.placeholder(tf.float32)
cost = tf.reduce_mean(loss)  #cost
opt = tf.train.AdadeltaOptimizer(learning_rate).minimize(cost) #optimizer


# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []
epoch_list=[]



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate value
lr=1e-5

# Start the Tensorflow Session
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in the test set
        noise_factor = 0.5
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in training set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Loss for the training set
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
#loss for the testing set   
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
   
        loss.append(batch_cost)
        valid_loss.append(batch_cost_test)
        epoch_list.append(e)

#plotting the validation and training loss
plt.plot(epoch_list, loss, 'bo', label='Training loss')
plt.plot(epoch_list, valid_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
#     saver.save(sess, 'encode_model') 

#understanding the output for the testing set
# printing origibal, noise induced and generated images
batch_x= mnist.test.next_batch(3)
#inducing noise for the testing set
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.5
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')

print("Original Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
#noisy images
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.6974 Validation loss: 0.6974
Epoch: 2/5... Training loss: 0.6973 Validation loss: 0.6974
Epoch: 3/5... Training loss: 0.6972 Validation loss: 0.6972
Epoch: 4/5... Training loss: 0.6971 Validation loss: 0.6971
Epoch: 5/5... Training loss: 0.6970 Validation loss: 0.6970
<matplotlib.figure.Figure at 0x2292073e6d8>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation:

Using the Adadelta optimizer throws the validation off slightly in comparison to the training loss. In most cases the validation loss is greater than the training loss. Additonally, comparing the images shows that the Adam Optimizer performed the best in comparison to RMSProp and Adadelta. Though it possible to monitor the performance of the network with RMSProp for larger number of epochs to check the performance.

Training loss: 0.6970 Validation loss: 0.6970

Final Result for Optimizer

The adam optimizer performed the best. this is a prospective parameter to tune

Hyperparameter Tuning Autoencoder

Lets change the loss function to Hinge_loss. Observe the change in code

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [8]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

# Place holders for the network

inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

# Activation Function Relu
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

#Defining the learning rate and cost
#change to hinge_loss
loss = tf.losses.hinge_loss(targets_,logits)

learning_rate=tf.placeholder(tf.float32)
cost = tf.reduce_mean(loss)  #cost
opt = tf.train.AdamOptimizer(learning_rate).minimize(cost) #optimizer


# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []
epoch_list=[]



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate value
lr=1e-5

# Start the Tensorflow Session
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in the test set
        noise_factor = 0.5
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in training set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Loss for the training set
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
#loss for the testing set   
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
   
        loss.append(batch_cost)
        valid_loss.append(batch_cost_test)
        epoch_list.append(e)

#plotting the validation and training loss
plt.plot(epoch_list, loss, 'bo', label='Training loss')
plt.plot(epoch_list, valid_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
#     saver.save(sess, 'encode_model') 

#understanding the output for the testing set
# printing origibal, noise induced and generated images
batch_x= mnist.test.next_batch(3)
#inducing noise for the testing set
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.5
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')

print("Original Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
#noisy images
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

# writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.4431 Validation loss: 0.4419
Epoch: 2/5... Training loss: 0.2960 Validation loss: 0.2965
Epoch: 3/5... Training loss: 0.2657 Validation loss: 0.2652
Epoch: 4/5... Training loss: 0.2265 Validation loss: 0.2273
Epoch: 5/5... Training loss: 0.1848 Validation loss: 0.1855
<matplotlib.figure.Figure at 0x16a005dba58>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation:

Selecting the Hinge_Loss cost function over the sigmoid_cross_entropy function also slightly affected the performance of the neural network. Using the Hinge_Loss function had a similar output as the sigmoid cross entropy functiion. Although, it would be interesting to observe their performance over a few 100 epochs to correctly distinguish the performance. But clearly hinge_loss can also be used to improve the autoencoder model.

Training loss: 0.1848 Validation loss: 0.1855

Hyperparameter Tuning Autoencoder

Next we will use reduce_sum and sigmoid_cross_entropy (without logits). Observe the changes in the code. Code remains same as the intial model

In [51]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [53]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

# Place holders for the network

inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

# Activation Function Relu
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

#Defining the learning rate and cost
# use loss function sigmoid cross entrpy
loss = tf.losses.sigmoid_cross_entropy(targets_,logits)

learning_rate=tf.placeholder(tf.float32)
#change to reduce_sum
cost = tf.reduce_sum(loss)  #cost
opt = tf.train.AdamOptimizer(learning_rate).minimize(cost) #optimizer


# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []
epoch_list=[]



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate value
lr=1e-5

# Start the Tensorflow Session
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in the test set
        noise_factor = 0.5
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in training set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Loss for the training set
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
#loss for the testing set   
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
   
        loss.append(batch_cost)
        valid_loss.append(batch_cost_test)
        epoch_list.append(e)

#plotting the validation and training loss
plt.plot(epoch_list, loss, 'bo', label='Training loss')
plt.plot(epoch_list, valid_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
#     saver.save(sess, 'encode_model') 

#understanding the output for the testing set
# printing origibal, noise induced and generated images
batch_x= mnist.test.next_batch(3)
#inducing noise for the testing set
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.5
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')

print("Original Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
#noisy images
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

# writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.4954 Validation loss: 0.4951
Epoch: 2/5... Training loss: 0.3516 Validation loss: 0.3507
Epoch: 3/5... Training loss: 0.2003 Validation loss: 0.1980
Epoch: 4/5... Training loss: 0.1699 Validation loss: 0.1687
Epoch: 5/5... Training loss: 0.1586 Validation loss: 0.1584
<matplotlib.figure.Figure at 0x1b191f54b00>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation:

Using the sigmoid_cross_entropy without logits and calculating the loss as the reduce_sum produced a similar trend of losses between the training set and the validation set. Though the clarity of images suggests that the using the tf.nn.sigmoid_cross_entropy_with_logits with the function of reduce_mean over the tf.nn.sigmoid_cross_entropy with the loss function as reduce_sum. The main difference between both the loss functionas are explained below :

Training loss: 0.1586 Validation loss: 0.1584

  1. Computes sigmoid cross entropy given logits. Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.

  2. tf.losses.sigmoid_cross_entropy Creates a cross-entropy loss using tf.nn.sigmoid_cross_entropy_with_logits. Basically weights act as the coefficient of the loss.If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

Result for Losses

sigmid cross entropy with an wothoutlogits can be considered for training autoencoders

Hyperparameter Tuning Gan

Loss function

Lets alter noise to 0.3 and observe the change

In [16]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [18]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

# Place holders for the network

inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

# Activation Function Relu
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

#Defining the learning rate and cost

loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=targets_)

learning_rate=tf.placeholder(tf.float32)
cost = tf.reduce_mean(loss)  #cost
opt = tf.train.AdamOptimizer(learning_rate).minimize(cost) #optimizer


# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []
epoch_list=[]



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate value
lr=1e-5

# Start the Tensorflow Session
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in the test set
#change noise here
        noise_factor = 0.2
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in training set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Loss for the training set
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
#loss for the testing set   
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
   
        loss.append(batch_cost)
        valid_loss.append(batch_cost_test)
        epoch_list.append(e)

#plotting the validation and training loss
plt.plot(epoch_list, loss, 'bo', label='Training loss')
plt.plot(epoch_list, valid_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
#     saver.save(sess, 'encode_model') 

#understanding the output for the testing set
# printing origibal, noise induced and generated images
batch_x= mnist.test.next_batch(3)
#inducing noise for the testing set
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.2
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')

print("Original Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
#noisy images
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

# writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.4987 Validation loss: 0.4987
Epoch: 2/5... Training loss: 0.2260 Validation loss: 0.2254
Epoch: 3/5... Training loss: 0.1568 Validation loss: 0.1574
Epoch: 4/5... Training loss: 0.1368 Validation loss: 0.1364
Epoch: 5/5... Training loss: 0.1256 Validation loss: 0.1256
<matplotlib.figure.Figure at 0x16a00539b38>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation:

Decreasing the noise from 0.5 to 0.3 did not help the autoencoder. The loss trend for the training and validation set remained the same, but noise did not improve the ability of the decoder to generate the image better. Hence, reducing noisehyper parametr from 0.5 to 0.3 did not help the autoencoder

Hyper Parameter Tuning using the Autoencoder

Lets try a learning rate of 0.01 . Initla code remains same. Observe comments for changes

In [20]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [21]:
def reset_graph(seed=2018):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

# Place holders for the network

inputs_ = tf.placeholder(tf.float32,[None,28,28,1])
targets_ = tf.placeholder(tf.float32,[None,28,28,1])

# Activation Function Relu
def lrelu(x,alpha=0.1):
    return tf.maximum(alpha*x,x)

### Encoder
with tf.name_scope('en-convolutions'):
    conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv1')
# Now 28x28x32
with tf.name_scope('en-pooling'):
    maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1')
# Now 14x14x32
with tf.name_scope('en-convolutions'):
    conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lrelu,name='conv2')
# Now 14x14x32
with tf.name_scope('encoding'):
    encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding')
# Now 7x7x32.
#latent space

# Now 7x7x32.
#latent space


### Decoder
with tf.name_scope('decoder'):
    conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True,activation=lrelu)
#Now 7x7x32        
    upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1')
# Now 14x14x32
    upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2')
# Now 28x28x32
    logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=True)
#Now 28x28x1
# Pass logits through sigmoid to get reconstructed image
    decoded = tf.sigmoid(logits,name='recon')

#Defining the learning rate and cost

loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=targets_)

learning_rate=tf.placeholder(tf.float32)
cost = tf.reduce_mean(loss)  #cost
opt = tf.train.AdamOptimizer(learning_rate).minimize(cost) #optimizer


# Training

sess = tf.Session()
#tf.reset_default_graph()

# saver = tf.train.Saver()
loss = []
valid_loss = []
epoch_list=[]



display_step = 1
epochs = 5
batch_size = 64
#lr=[1e-3/(2**(i//5))for i in range(epochs)]
#learning rate value
#chnage learning rate value
lr=0.01

# Start the Tensorflow Session
sess.run(tf.global_variables_initializer())
# writer = tf.summary.FileWriter('./graphs', sess.graph)
for e in range(epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    for ibatch in range(total_batch):
        batch_x = mnist.train.next_batch(batch_size)
        batch_test_x= mnist.test.next_batch(batch_size)
        imgs_test = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in the test set
        noise_factor = 0.5
        x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) 
        x_test_noisy = np.clip(x_test_noisy, 0., 1.)
        imgs = batch_x[0].reshape((-1, 28, 28, 1))
#Inducing noise in training set
        x_train_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
        x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Loss for the training set
        batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,
                                                         targets_: imgs,learning_rate:lr})
#loss for the testing set   
        batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,
                                                         targets_: imgs_test})
    if (e+1) % display_step == 0:
        print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost),
                 "Validation loss: {:.4f}".format(batch_cost_test))
   
        loss.append(batch_cost)
        valid_loss.append(batch_cost_test)
        epoch_list.append(e)

#plotting the validation and training loss
plt.plot(epoch_list, loss, 'bo', label='Training loss')
plt.plot(epoch_list, valid_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
#     saver.save(sess, 'encode_model') 

#understanding the output for the testing set
# printing origibal, noise induced and generated images
batch_x= mnist.test.next_batch(3)
#inducing noise for the testing set
imgs = batch_x[0].reshape((-1, 28, 28, 1))
noise_factor = 0.5
x_test_noisy = imgs + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs.shape) 
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
recon_img = sess.run([decoded], feed_dict={inputs_: x_test_noisy})[0]
plt.figure(figsize=(20, 4))
plt.title('Reconstructed Images')

print("Original Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(imgs[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Noisy Images")
#noisy images
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(x_test_noisy[i, ..., 0], cmap='gray')
plt.show()    
plt.figure(figsize=(20, 4))
print("Reconstruction of Noisy Images")
for i in range(3):
    plt.subplot(2, 10, i+1)
    plt.imshow(recon_img[i, ..., 0], cmap='gray')    
plt.show()    

# writer.close()

sess.close()
Epoch: 1/5... Training loss: 0.0977 Validation loss: 0.0965
Epoch: 2/5... Training loss: 0.0997 Validation loss: 0.0997
Epoch: 3/5... Training loss: 0.0996 Validation loss: 0.0981
Epoch: 4/5... Training loss: 0.0993 Validation loss: 0.0996
Epoch: 5/5... Training loss: 0.0999 Validation loss: 0.1006
<matplotlib.figure.Figure at 0x16a5ec00cc0>
Original Images
Noisy Images
Reconstruction of Noisy Images

Observation :

Using a learning rate of 0.01 skewed the losses completely. The training and validation loss are no longer consistently decreasing. The losses finally reduce for the training set , but not in alignment with testing set. Lets compare the quality of images with learning rate 1e-5 and 0.01 learning rate 1e-5

0.0999 Validation loss: 0.1006

image.png

The learning rate 0.01 provided defintely more clarity than the learning rate 1e-5.

Observation:

It is observed that the Autoencoder completely misses the pixel values required for 5 epochs for a learning rate of 0.1. Here since the learning rate is high it quickly traverses the loss scenario, missing the pixel values during traversal. Hence, a learning rate of 0.01 is most optiml for the autoencoder. Additionally, if the Losses graph is compared there is a steep increase in the loss as well. Hence , learning rate also affects the Denoising Autoencoder and a parameter that should be tuned

Final results for Autoencoder

Tuning the Autoencoder depicts that the hyper parameter noise, loss function as Hinge Loss and Sigmoid Cross Entropy seem to work the best. Reducing the noise helped improving the loss as presumably the auto encoder can learn the image better. Additionally, learning rate of 0.01 provided a very low loss 0.0999 and 0.1006 for the encoder and decoder respectively. Hence, the prospective hyper parameters to tune first while tuning an Autoencoder are as follows:

  1. Activation Function
  2. Noise Factor
  3. Learning Rate
  4. Optimizer

Summary

image.png

References

MLP Neural Network

The MLP code is based on the following:

[1] Tomar,Nikhil. “Iris Datset Classifivation using Tensorflow”. github.io, October 11. 2017. Web. 22 April. 2018. (no licence specified) Web Link :

https://github.com/nikhilroxtomar/Iris-Data-Set-Classification-using-TensorFlow-MLP/blob/master/iris.py

http://idiotdeveloper.tk/iris-data-set-classification-using-tensorflow-multilayer-perceptron/

Running the Tensorflow Session is based on the following article:

[2] Vikram K. “Deep Learning”. github.io, August 28. 2016. Web. 22 April. 2018. (no licence specified)

Weblink : https://github.com/Vikramank/Deep-Learning-/blob/master/Iris%20data%20classification.ipynb

.youtube.com/watch?v=a5BUunInTQU&t=1227s

[3] Understanding basics of Deep Neural Networks using the following videos by MIT

Deep Learning - https://www.youtube.com/watch?v=JN6H4rQvwgY&feature=youtu.be

CNN

[1] Understanding Xavier Initialization

Weblinks:

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

[2] HVASS Laboratories - Tensorflow Tutorials 06 - CIFAR10

Weblink :

https://www.youtube.com/watch?v=3BXfw_1_TF4

[3] Magnus Erik Hvass Pedersen. “Tensorflow Tutorials”. github.io, Licenced by MIT ,December 16th. 2016. Web. 23 April. 2018.

Used for visualizing the CIFAR 10 Data

Weblink : https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/06_CIFAR-10.ipynb

[4] Ataspinar. “Building Convolutional Neural Networks ith Tensorflow”. github.io, December 16th. 2016. Web. 23 April. 2018.

LENET-5 and Like CNN code is based on the following article https://github.com/taspinar/sidl

[5] Ataspinar. “Building Convolutional Neural Networks ith Tensorflow”., December 16th. 2016. Web. 23 April. 2018.

Web Link: http://ataspinar.com/2017/08/15/building-convolutional-neural-networks-with-tensorflow/

[6] https://en.wikipedia.org/wiki/Convolutional_neural_network [7] https://en.wikipedia.org/wiki/Hinge_loss

RNN

[8] The data is accessed from the following link

https://github.com/sumit-kothari/AlphaNum-HASYv2

[9] Jasdeep06. “Understanding-LSTM-in-Tensorflow-MNIST”. github.io, 10 Sept. 2017. Web. 18 April. 2018. The netwrok Structure and exploratory data analysis functions is based from the following link

https://jasdeep06.github.io/posts/Understanding-LSTM-in-Tensorflow-MNIST/

[10] Kothari,Sumit. “Alpha- Numeric Handwritten Dataset”. , 10 Sept. 2017. Web. 18 April. 2018.

Data preprocessing is based on the below link

https://www.kaggle.com/usersumit/basic-eda-keras-ann-model/notebook

[11] Other references

Chentinghao,Tinghao. “Tensorflow RNN Tutorial MNIST”.,Medium, 9th January. 2018. Web. 18 April. 2018.

https://medium.com/machine-learning-algorithms/mnist-using-recurrent-neural-network-2d070a5915a2

https://github.com/chentinghao/tinghao-tensorflow-rnn-tutorial/blob/master/mnist_rnn.ipynb

[12] Implement Tensorflow Next Batch, Stack Overflow

https://stackoverflow.com/questions/40994583/how-to-implement-tensorflows-next-batch-for-own-data?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

[13] Background Reserach on RNN LSTM done using the below article

https://deeplearning4j.org/lstm.html

RBM

[14] Omid Alemi. “Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow”.,Licenced under MIT Licence,Medium, 15th August. 2017. Web. 23 April. 2018.

The code is based on code by Omid Alemi licenced under MIT

Website: https://github.com/patricieni/RBM-Tensorflow/blob/master/Gaussian%20RBM.ipynb https://github.com/omimo/xRBM/tree/master/examples

[15] MNist dataset Decription is based from this article

Website : http://corochann.com/mnist-dataset-introduction-1138.html https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine

GAN

The code has been based on the below blog post with custom addition of functions and illustartions

[16] Kristiadi's,Agustinus. “Generative Adversarial Network using Tensorflow”.,Setember 17th 2016,. Web. 19 April. 2018. Weblinks :

https://wiseodd.github.io/techblog/2016/09/17/gan-tensorflow/

https://github.com/wiseodd/generative-models/blob/master/GAN/softmax_gan/softmax_gan_tensorflow.py

https://wiseodd.github.io/page3/

[17] Back ground reserach

https://en.wikipedia.org/wiki/Generative_adversarial_network

Autoencoder

[18] The code has been based on the below blog and custom changes have been made to this code to incorporate teh requirements of the project Sharma,Aditya. “Understanding Autoencoders using Tensorflow”.,LearnOpenCV,November 15th 2017,. Web. 20 April. 2018. Website : https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/

[19] Mallick,Satya(spmallick),“Denoising-Autoencoder-using-Tensorflow.”,GitHub,LearnOpenCV,November 26th 2017,. Web. 20 April. 2018. Website:https://github.com/spmallick/learnopencv/tree/master/DenoisingAutoencoder

https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits https://www.tensorflow.org/api_docs/python/tf/losses/sigmoid_cross_entropy

In [ ]: